Exporting to Google Sheets with Scrapy#
To configure a Scrapy project or spider to export scraped data to Google Sheets:
You need Python 3.8 or higher.
If you are using Scrapy Cloud, make sure you are using stack
scrapy:1.7-py38
or higher. Using the latest stack (scrapy:2.11
) is generally recommended.Install scrapy-feedexporter-google-sheets:
pip install git+https://github.com/scrapy-plugins/scrapy-feedexporter-google-sheets
If you are using Scrapy Cloud, remember to add the following line to your
requirements.txt
file:scrapy_google_sheets_exporter @ git+https://github.com/scrapy-plugins/scrapy-feedexporter-google-sheets
In your
settings.py
file, defineFEED_STORAGES
as follows:FEED_STORAGES = { "gsheets": "scrapy_google_sheets_exporter.gsheets_exporter.GoogleSheetsFeedStorage", }
If the setting already exists in your
settings.py
file, modify the existing setting to add the key-value pair above, instead of re-defining the setting.Add a FEEDS setting to your project or spider, if not added yet.
The value of
FEEDS
must be a JSON object ({}
).If you have
FEEDS
already defined with key-value pairs, you can keep those if you want —FEEDS
supports exporting data to multiple file storage service locations.To add
FEEDS
to a project, define it in your Scrapy Cloud project settings or add it to yoursettings.py
file:FEEDS = {}
To add
FEEDS
to a spider, define it in your Scrapy Cloud spider-specific settings (open a spider in Scrapy Cloud and select the Settings tab) or add it to your spider code with the update_settings method or the custom_settings class variable:class MySpider: custom_settings = { "FEEDS": {}, }
Add the following key-value pair to
FEEDS
:{ "gsheets://docs.google.com/spreadsheets/d/<SPREADSHEET ID>/edit#gid=<WORKSHEET ID>": { "format": "csv" } }
Where:
You can find the right values for
<SPREADSHEET ID>
and<WORKSHEET ID>
in the URL when you are looking at the target worksheet, e.g:https://docs.google.com/spreadsheets/d/1fWJgq5yuOdeN3YnkBZiTD0VhB1MLzBNomz0s9YwBREo/edit#gid=1261678709
.Note
If
/edit#gid=<WORKSHEET ID>
is omitted, the first worksheet is used.
To append to an existing worksheet, you should also:
Use the
fields
feed option of FEEDS or the FEED_EXPORT_FIELDS Scrapy setting to explicitly indicate all fields to export, in the expected order.Set
item_export_kwargs.include_headers_line
toFalse
, to not write the header row.
For example:
{ "gsheets://docs.google.com/spreadsheets/d/<SPREADSHEET ID>/edit#gid=<WORKSHEET ID>": { "format": "csv", "fields": ["field1", "field2"], "item_export_kwargs": {"include_headers_line": False} } }
Define the
GOOGLE_CREDENTIALS
setting as a Python dictionary containing your service account credentials in JSON format:GOOGLE_CREDENTIALS = { "type": "service_account", "project_id": "myproject", "private_key_id": "…", "private_key": "…", "client_email": "…@email.iam.gserviceaccount.com", "client_id": "…", "auth_uri": "…", "token_uri": "…", "auth_provider_x509_cert_url": "…", "client_x509_cert_url": "…" }
Make sure you give your service account write access on the target spreadsheet. You can do that by sharing the spreadsheet with the email of the service account (
client_email
in the JSON above).
Running your spider now, locally or on Scrapy Cloud, will export your scraped data to the configured Google Sheets worksheet.