Exporting to Google Sheets with Scrapy#

To configure a Scrapy project or spider to export scraped data to Google Sheets:

You need Python 3.8 or higher.

If you are using Scrapy Cloud, make sure you are using stack scrapy:1.7-py38 or higher. Using the latest stack (scrapy:2.11) is generally recommended.

Install scrapy-feedexporter-google-sheets:

pip install git+https://github.com/scrapy-plugins/scrapy-feedexporter-google-sheets

If you are using Scrapy Cloud, remember to add the following line to your requirements.txt file:

scrapy-feedexporter-google-sheets @ git+https://github.com/scrapy-plugins/scrapy-feedexporter-google-sheets

In your settings.py file, define FEED_STORAGES as follows:
settings.py#
```
FEED_STORAGES = {
    "gsheets": "scrapy_google_sheets_exporter.gsheets_exporter.GoogleSheetsFeedStorage",
}
```
If the setting already exists in your settings.py file, modify the existing setting to add the key-value pair above, instead of re-defining the setting.
Add a FEEDS setting to your project or spider, if not added yet.

The value of FEEDS must be a JSON object ({}).

If you have FEEDS already defined with key-value pairs, you can keep those if you want — FEEDS supports exporting data to multiple file storage service locations.

To add FEEDS to a project, define it in your Scrapy Cloud project settings or add it to your settings.py file:
settings.py#
```
FEEDS = {}
```
To add FEEDS to a spider, define it in your Scrapy Cloud spider-specific settings (open a spider in Scrapy Cloud and select the Settings tab) or add it to your spider code with the update_settings method or the custom_settings class variable:
spiders/myspider.py#
```
class MySpider:
    custom_settings = {
        "FEEDS": {},
    }
```
Add the following key-value pair to FEEDS:
```
{
    "gsheets://docs.google.com/spreadsheets/d/<SPREADSHEET ID>/edit#gid=<WORKSHEET ID>": {
        "format": "csv"
    }
}
```
Where:
- You can find the right values for <SPREADSHEET ID> and <WORKSHEET ID> in the URL when you are looking at the target worksheet, e.g: https://docs.google.com/spreadsheets/d/1fWJgq5yuOdeN3YnkBZiTD0VhB1MLzBNomz0s9YwBREo/edit#gid=1261678709.
  
  Note
  
  If /edit#gid=<WORKSHEET ID> is omitted, the first worksheet is used.
To append to an existing worksheet, you should also:
- Use the fields feed option of FEEDS or the FEED_EXPORT_FIELDS Scrapy setting to explicitly indicate all fields to export, in the expected order.
- Set item_export_kwargs.include_headers_line to False, to not write the header row.
For example:
```
{
    "gsheets://docs.google.com/spreadsheets/d/<SPREADSHEET ID>/edit#gid=<WORKSHEET ID>": {
        "format": "csv",
        "fields": ["field1", "field2"],
        "item_export_kwargs": {"include_headers_line": False}
    }
}
```

Define the GOOGLE_CREDENTIALS setting as a Python dictionary containing your service account credentials in JSON format:

settings.py#

GOOGLE_CREDENTIALS = {
    "type": "service_account",
    "project_id": "myproject",
    "private_key_id": "…",
    "private_key": "…",
    "client_email": "…@email.iam.gserviceaccount.com",
    "client_id": "…",
    "auth_uri": "…",
    "token_uri": "…",
    "auth_provider_x509_cert_url": "…",
    "client_x509_cert_url": "…"
}

Make sure you give your service account write access on the target spreadsheet. You can do that by sharing the spreadsheet with the email of the service account (client_email in the JSON above).

Running your spider now, locally or on Scrapy Cloud, will export your scraped data to the configured Google Sheets worksheet.