Exporting to Google Sheets with Scrapy#

To configure a Scrapy project or spider to export scraped data to Google Sheets:

  1. You need Python 3.8 or higher.

    If you are using Scrapy Cloud, make sure you are using stack scrapy:1.7-py38 or higher. Using the latest stack (scrapy:2.11) is generally recommended.

  2. Install scrapy-feedexporter-google-sheets:

    pip install git+https://github.com/scrapy-plugins/scrapy-feedexporter-google-sheets
    

    If you are using Scrapy Cloud, remember to add the following line to your requirements.txt file:

    scrapy-feedexporter-google-sheets @ git+https://github.com/scrapy-plugins/scrapy-feedexporter-google-sheets
    
  3. In your settings.py file, define FEED_STORAGES as follows:

    settings.py#
    FEED_STORAGES = {
        "gsheets": "scrapy_google_sheets_exporter.gsheets_exporter.GoogleSheetsFeedStorage",
    }
    

    If the setting already exists in your settings.py file, modify the existing setting to add the key-value pair above, instead of re-defining the setting.

  4. Add a FEEDS setting to your project or spider, if not added yet.

    The value of FEEDS must be a JSON object ({}).

    If you have FEEDS already defined with key-value pairs, you can keep those if you want — FEEDS supports exporting data to multiple file storage service locations.

    To add FEEDS to a project, define it in your Scrapy Cloud project settings or add it to your settings.py file:

    settings.py#
    FEEDS = {}
    

    To add FEEDS to a spider, define it in your Scrapy Cloud spider-specific settings (open a spider in Scrapy Cloud and select the Settings tab) or add it to your spider code with the update_settings method or the custom_settings class variable:

    spiders/myspider.py#
    class MySpider:
        custom_settings = {
            "FEEDS": {},
        }
    
  5. Add the following key-value pair to FEEDS:

    {
        "gsheets://docs.google.com/spreadsheets/d/<SPREADSHEET ID>/edit#gid=<WORKSHEET ID>": {
            "format": "csv"
        }
    }
    

    Where:

    • You can find the right values for <SPREADSHEET ID> and <WORKSHEET ID> in the URL when you are looking at the target worksheet, e.g: https://docs.google.com/spreadsheets/d/1fWJgq5yuOdeN3YnkBZiTD0VhB1MLzBNomz0s9YwBREo/edit#gid=1261678709.

      Note

      If /edit#gid=<WORKSHEET ID> is omitted, the first worksheet is used.

    To append to an existing worksheet, you should also:

    • Use the fields feed option of FEEDS or the FEED_EXPORT_FIELDS Scrapy setting to explicitly indicate all fields to export, in the expected order.

    • Set item_export_kwargs.include_headers_line to False, to not write the header row.

    For example:

    {
        "gsheets://docs.google.com/spreadsheets/d/<SPREADSHEET ID>/edit#gid=<WORKSHEET ID>": {
            "format": "csv",
            "fields": ["field1", "field2"],
            "item_export_kwargs": {"include_headers_line": False}
        }
    }
    
  6. Define the GOOGLE_CREDENTIALS setting as a Python dictionary containing your service account credentials in JSON format:

    settings.py#
    GOOGLE_CREDENTIALS = {
        "type": "service_account",
        "project_id": "myproject",
        "private_key_id": "…",
        "private_key": "…",
        "client_email": "…@email.iam.gserviceaccount.com",
        "client_id": "…",
        "auth_uri": "…",
        "token_uri": "…",
        "auth_provider_x509_cert_url": "…",
        "client_x509_cert_url": "…"
    }
    

    Make sure you give your service account write access on the target spreadsheet. You can do that by sharing the spreadsheet with the email of the service account (client_email in the JSON above).

Running your spider now, locally or on Scrapy Cloud, will export your scraped data to the configured Google Sheets worksheet.