Exporting to Google BigQuery with Scrapy#
To configure a Scrapy project or spider to export scraped data to Google BigQuery:
You need Python 3.7 or higher and Scrapy 2.4 or higher.
If you are using Scrapy Cloud, make sure you are using stack
scrapy:2.4
or higher. Using the latest stack (scrapy:2.11
) is generally recommended.Install scrapy-bigquery:
pip install scrapy-bigquery
If you are using Scrapy Cloud, remember to add the following line to your
requirements.txt
file:scrapy-bigquery
Define the
BIGQUERY_DATASET
andBIGQUERY_TABLE
Scrapy settings to point to the target table. For example:BIGQUERY_DATASET = "my-dataset" BIGQUERY_TABLE = "my-table"
Additional settings are available.
Tip
To add Scrapy settings to a project, define them in your Scrapy Cloud project settings or add them to your
settings.py
file.MY_SETTING = ...
To add settings to a spider, define them in your Scrapy Cloud spider-specific settings (open a spider in Scrapy Cloud and select the Settings tab) or add it to your spider code with the update_settings method or the custom_settings class variable:
class MySpider: custom_settings = { "MY_SETTING": ..., }
Define the
BIGQUERY_SERVICE_ACCOUNT
setting as a string with your service account credentials in base64-encoded JSON format:BIGQUERY_SERVICE_ACCOUNT = "eyJ0eX=="
You can use the following command to generate the required value from your service account JSON file:
cat service-account.json | jq . -c | base64
Make sure you give your service account write access on the target table. You can do that by sharing the table with the email of the service account (
client_email
in the service account JSON).
Running your spider now, locally or on Scrapy Cloud, will export your scraped data to the configured Google BigQuery table.