Warning

Zyte API is replacing Smart Proxy Manager. It is no longer possible to sign up to Smart Proxy Manager. If you are an existing Smart Proxy Manager user, see Migrating from Smart Proxy Manager to Zyte API.

Fetch API (replaced by Zyte API)#

Warning

Fetch API functionality is now offered under Zyte API, see Get started with Zyte API. This documentation is left for reference purposes of existing Fetch API users, new users should sign up to Zyte API.

Warning

To use the Fetch API you will need a Smart Proxy Manager API key with Browser Execution functionality enabled, even if you don’t use the render and screenshot parameters. Otherwise you will get 401 Unauthorized response.

The Fetch API allows you to download web pages using an HTTP API, instead of a proxy API. It provides server-side browser execution capabilities, and better browser emulation than requests processed through the standard proxy API.

Authentication is done through standard HTTP auth, using your Smart Proxy Manager API key as the user name and an empty password.

Here is an example of a working request (replace API_KEY with your API key):

curl -u <API_KEY>: http://fetch.crawlera.com:8011/fetch/v2 -d '{"url": "https://toscrape.com/"}' -H 'Content-Type: application/json'

The examples in this documentation are provided as commands to execute in a terminal. You will need the curl and jq command line tools. curl often comes installed with your operating system, while jq needs to be downloaded.

You can download jq at: https://stedolan.github.io/jq/download/

Request Endpoint & Parameters#

  • Endpoint: http://fetch.crawlera.com:8011/fetch/v2

  • Method: POST

  • Parameter values should be URL encoded.

Parameter

Required

Description

Example

Default

url

yes

URL to fetch

https://toscrape.com

region

no

The region to route the request through, specified as a country code. If auto or ommitted, Smart Proxy Manager will pick the best region to route the request based on the target website.

es

auto

render

no

Pass true to render the URL in a browser.

true

false

screenshot

no

Pass true to return the screenshot field in the response, with a screenshot of the page. Implies render=true.

true

false

Response Format#

The status code of the Fetch API response is always 200 (regardless of the response from the target website) unless there is a problem with the Fetch API itself.

The response size limit is 10 Mb.

The API response is a (utf-8 encoded) JSON object with the following fields:

Name

Type

Description

url

String

The URL of the page fetched

body

String

The body of the response, encoded using body_encoding.

body_encoding

String

The encoding used for the body of the response. Either plain or base64.

headers

Object

The HTTP headers of the response.

original_status

String

The HTTP status of the response received from the website

crawlera_status

string

Smart Proxy Manager status, one of:

  • success - successful request (counts towards monthly quota)

  • ban - request was banned after trying multiple proxies

  • fail - other error (not a ban) prevented fulfilling the request. See crawlera_error.

screenshot

String

A screenshot of the page, encoded in base64

Example Fetch API response:

{
      "url": "https://toscrape.com",
      "screenshot": "",
      "original_status": 200,
      "headers": {
        "server": "nginx/1.14.0 (Ubuntu)",
        "date": "Mon, 25 May 2020 17:40:16 GMT",
        "content-type": "text/html",
        "last-modified": "Wed, 29 Jun 2016 21:51:37 GMT",
        "x-upstream": "toscrape-sites-master_web",
        "transfer-encoding": "chunked"
      },
      "crawlera_status": "success",
      "body_encoding": "plain",
      "body": "...HTML of the response goes here..."
    }

Use Cases#

Fetch the HTML of a page rendered in a browser#

To run this example you will need:

  • a Smart Proxy Manager API key with Browser Execution enabled

  • curl, jq command line utilies

Example:

curl -u <API_KEY>: http://fetch.crawlera.com:8011/fetch/v2/ -d '{"url": "https://toscrape.com/", "render": true}' -H 'Content-Type: application/json' | jq '.body' -r > page.html

Fetching a screenshot#

To run this example you will need:

  • a Smart Proxy Manager API key with Browser Execution enabled

  • curl, jq, base64 command line utilies

Example:

curl -u <API_KEY>: http://fetch.zyte.com:8011/fetch/v2/ -d '{"url": "https://toscrape.com/", "render": true, "screenshot": true}' -H 'Content-Type: application/json' | jq '.screenshot' -r | base64 -d > image.jpg

Scrapy Middleware for Fetch API#

There is an official Scrapy downloader middleware to download pages using Fetch API.

https://github.com/scrapy-plugins/scrapy-crawlera-fetch

Installation and usage instructions can be found in the README of the project.