Enable Zyte API to avoid bans#

Now that you have run your project in Scrapy Cloud, it is time to improve the project itself, starting with handling website bans.

Your target domain in this tutorial, toscrape.com, does not ban traffic. However, when targeting other websites, sooner or later you will get bans.

You will now configure your web scraping code to use Zyte API to avoid bans on any website:

  1. Sign up for Zyte API. You get $5 free for a month, and you should only need a fraction of that to complete this tutorial.

  2. Install the latest version of scrapy-zyte-api:

    pip install --upgrade scrapy-zyte-api
    

    Also add the following line to your requirements.txt file, so that your project also continues to work when running on Scrapy Cloud:

    scrapy-zyte-api
    
  3. Configure scrapy-zyte-api in transparent mode by adding the following code at the end of tutorial/settings.py, replacing YOUR_API_KEY with your API key:

    DOWNLOAD_HANDLERS = {
        "http": "scrapy_zyte_api.ScrapyZyteAPIDownloadHandler",
        "https": "scrapy_zyte_api.ScrapyZyteAPIDownloadHandler",
    }
    DOWNLOADER_MIDDLEWARES = {
        "scrapy_zyte_api.ScrapyZyteAPIDownloaderMiddleware": 1000,
    }
    SPIDER_MIDDLEWARES = {
        "scrapy_zyte_api.ScrapyZyteAPISpiderMiddleware": 100,
    }
    REQUEST_FINGERPRINTER_CLASS = "scrapy_zyte_api.ScrapyZyteAPIRequestFingerprinter"
    ZYTE_API_KEY = "YOUR_API_KEY"
    ZYTE_API_TRANSPARENT_MODE = True
    

If you run your code again:

scrapy crawl books_toscrape_com -O books.csv

Your code will work the same, only that requests will be sent through Zyte API, to avoid bans cost-efficiently.

If you ever find a website for which Zyte API does not work as expected (e.g. gives you a ban response or too many errors), please open a support ticket to help us fix it as soon as possible.

Tip

If you get an SSL error, install the Zyte CA certificate on your system and try again.

Continue to the next chapter to learn about browser automation.