Warning
Zyte API is replacing Smart Proxy Manager. It is no longer possible to sign up to Smart Proxy Manager. If you are an existing Smart Proxy Manager user, see Migrating from Smart Proxy Manager to Zyte API.
Fetch API (replaced by Zyte API)#
Warning
Fetch API functionality is now offered under Zyte API, see Get started with Zyte API. This documentation is left for reference purposes of existing Fetch API users, new users should sign up to Zyte API.
Warning
To use the Fetch API you will need a Smart Proxy Manager API key with Browser
Execution functionality enabled, even if you don’t use the render
and
screenshot
parameters. Otherwise you will get 401 Unauthorized
response.
The Fetch API allows you to download web pages using an HTTP API, instead of a proxy API. It provides server-side browser execution capabilities, and better browser emulation than requests processed through the standard proxy API.
Authentication is done through standard HTTP auth, using your Smart Proxy Manager API key as the user name and an empty password.
Here is an example of a working request (replace API_KEY
with your API
key):
curl -u <API_KEY>: http://fetch.crawlera.com:8011/fetch/v2 -d '{"url": "https://toscrape.com/"}' -H 'Content-Type: application/json'
The examples in this documentation are provided as commands to execute in a
terminal. You will need the curl
and jq
command line tools. curl
often comes installed with your operating system, while jq
needs to be
downloaded.
You can download jq at: https://stedolan.github.io/jq/download/
Request Endpoint & Parameters#
Endpoint:
http://fetch.crawlera.com:8011/fetch/v2
Method:
POST
Parameter values should be URL encoded.
Parameter |
Required |
Description |
Example |
Default |
---|---|---|---|---|
|
yes |
URL to fetch |
|
|
|
no |
The region to route the request through,
specified as a country code. If |
|
|
|
no |
Pass |
|
|
|
no |
Pass |
|
|
Response Format#
The status code of the Fetch API response is always 200 (regardless of the response from the target website) unless there is a problem with the Fetch API itself.
The response size limit is 10 Mb.
The API response is a (utf-8
encoded) JSON object with the following fields:
Name |
Type |
Description |
---|---|---|
|
String |
The URL of the page fetched |
|
String |
The body of the response, encoded using |
|
String |
The encoding used for the body of the response. Either |
|
Object |
The HTTP headers of the response. |
|
String |
The HTTP status of the response received from the website |
|
string |
Smart Proxy Manager status, one of:
|
|
String |
A screenshot of the page, encoded in |
Example Fetch API response:
{
"url": "https://toscrape.com",
"screenshot": "",
"original_status": 200,
"headers": {
"server": "nginx/1.14.0 (Ubuntu)",
"date": "Mon, 25 May 2020 17:40:16 GMT",
"content-type": "text/html",
"last-modified": "Wed, 29 Jun 2016 21:51:37 GMT",
"x-upstream": "toscrape-sites-master_web",
"transfer-encoding": "chunked"
},
"crawlera_status": "success",
"body_encoding": "plain",
"body": "...HTML of the response goes here..."
}
Use Cases#
Fetch the HTML of a page rendered in a browser#
To run this example you will need:
a Smart Proxy Manager API key with Browser Execution enabled
curl
,jq
command line utilies
Example:
curl -u <API_KEY>: http://fetch.crawlera.com:8011/fetch/v2/ -d '{"url": "https://toscrape.com/", "render": true}' -H 'Content-Type: application/json' | jq '.body' -r > page.html
Fetching a screenshot#
To run this example you will need:
a Smart Proxy Manager API key with Browser Execution enabled
curl
,jq
,base64
command line utilies
Example:
curl -u <API_KEY>: http://fetch.zyte.com:8011/fetch/v2/ -d '{"url": "https://toscrape.com/", "render": true, "screenshot": true}' -H 'Content-Type: application/json' | jq '.screenshot' -r | base64 -d > image.jpg
Scrapy Middleware for Fetch API#
There is an official Scrapy downloader middleware to download pages using Fetch API.
https://github.com/scrapy-plugins/scrapy-crawlera-fetch
Installation and usage instructions can be found in the README of the project.