Exporting to Google BigQuery with Scrapy#

To configure a Scrapy project or spider to export scraped data to Google BigQuery:

  1. You need Python 3.7 or higher and Scrapy 2.4 or higher.

    If you are using Scrapy Cloud, make sure you are using stack scrapy:2.4 or higher. Using the latest stack (scrapy:2.11) is generally recommended.

  2. Install scrapy-bigquery:

    pip install scrapy-bigquery
    

    If you are using Scrapy Cloud, remember to add the following line to your requirements.txt file:

    scrapy-bigquery
    
  3. Define the BIGQUERY_DATASET and BIGQUERY_TABLE Scrapy settings to point to the target table. For example:

    settings.py#
    BIGQUERY_DATASET = "my-dataset"
    BIGQUERY_TABLE = "my-table"
    

    Additional settings are available.

    Tip

    To add Scrapy settings to a project, define them in your Scrapy Cloud project settings or add them to your settings.py file.

    settings.py#
    MY_SETTING = ...
    

    To add settings to a spider, define them in your Scrapy Cloud spider-specific settings (open a spider in Scrapy Cloud and select the Settings tab) or add it to your spider code with the update_settings method or the custom_settings class variable:

    spiders/myspider.py#
    class MySpider:
        custom_settings = {
            "MY_SETTING": ...,
        }
    
  4. Define the BIGQUERY_SERVICE_ACCOUNT setting as a string with your service account credentials in base64-encoded JSON format:

    settings.py#
    BIGQUERY_SERVICE_ACCOUNT = "eyJ0eX=="
    

    You can use the following command to generate the required value from your service account JSON file:

    cat service-account.json | jq . -c | base64
    

    Make sure you give your service account write access on the target table. You can do that by sharing the table with the email of the service account (client_email in the service account JSON).

Running your spider now, locally or on Scrapy Cloud, will export your scraped data to the configured Google BigQuery table.