Zyte API proxy mode#
To use Zyte API as a proxy, use the api.zyte.com:8011
endpoint, with your
API key and proxy headers:
using System;
using System.IO;
using System.Net;
using System.Text;
var proxy = new WebProxy("http://api.zyte.com:8011", true);
proxy.Credentials = new NetworkCredential("YOUR_API_KEY", "");
var request = (HttpWebRequest)WebRequest.Create("https://toscrape.com");
request.Proxy = proxy;
request.PreAuthenticate = true;
request.AllowAutoRedirect = false;
var response = (HttpWebResponse)request.GetResponse();
var stream = response.GetResponseStream();
var reader = new StreamReader(stream);
var httpResponseBody = reader.ReadToEnd();
reader.Close();
response.Close();
Console.WriteLine(httpResponseBody);
curl \
--proxy api.zyte.com:8011 \
--proxy-user YOUR_API_KEY: \
--compressed \
https://toscrape.com
const axios = require('axios')
axios
.get(
'https://toscrape.com',
{
proxy: {
protocol: 'http',
host: 'api.zyte.com',
port: 8011,
auth: {
username: 'YOUR_API_KEY',
password: ''
}
}
}
)
.then((response) => {
const httpResponseBody = response.data
console.log(httpResponseBody)
})
<?php
$client = new GuzzleHttp\Client();
$response = $client->request('GET', 'https://toscrape.com', [
'proxy' => 'http://YOUR_API_KEY:@api.zyte.com:8011',
]);
$http_response_body = (string) $response->getBody();
fwrite(STDOUT, $http_response_body);
Note
You need to install and configure our CA certificate for the requests library.
import requests
response = requests.get(
"https://toscrape.com",
proxies={
scheme: "http://YOUR_API_KEY:@api.zyte.com:8011" for scheme in ("http", "https")
},
)
http_response_body: bytes = response.content
print(http_response_body.decode())
When using scrapy-zyte-smartproxy, set the ZYTE_SMARTPROXY_URL
setting to "http://api.zyte.com:8011"
and the
ZYTE_SMARTPROXY_APIKEY
setting to your API key for Zyte API.
Then you can continue using Scrapy as usual and all requests will be proxied through Zyte API automatically.
from scrapy import Spider
class ToScrapeSpider(Spider):
name = "toscrape_com"
start_urls = ["https://toscrape.com"]
def parse(self, response):
print(response.text)
Limitations#
The proxy mode makes it easier to migrate existing code that uses a proxy service.
However, the proxy mode has some limitations when compared to the HTTP API:
You cannot use browser or automatic extraction requests.
The proxy mode is not optimized for use in combination with browser automation tools.
Tip
Use Zyte API’s browser automation features instead. See Migrating from browser automation to Zyte API.
You cannot request a server-managed session through a session context. However, Zyte-Session-ID enables client-managed sessions.
You can only set cookies for the domain of the target URL, you cannot manually set cookies for additional domains that may be reached through redirection.
You cannot set
echoData
request metadata.
Request headers#
The following headers allow changing how a request is sent through Zyte API in proxy mode.
Zyte-Client#
May be used to report to Zyte the software being used to access Zyte API.
It should be formatted with the syntax of the User-Agent header, e.g.
curl/1.2.3
.
Zyte-Device#
Sets device emulation.
Zyte-Geolocation#
Sets a geolocation.
Zyte-IPType#
Sets ipType.
Zyte-JobId#
Sets the ID of the Scrapy Cloud job that is sending the request.
scrapy-zyte-smartproxy sets this header automatically when used from a Scrapy Cloud job.
Zyte-Override-Headers#
Zyte API automatically sends some request headers for ban avoidance.
Custom headers from your request will override most automatic headers, but not these:
Accept
Accept-Encoding
User-Agent
To override any of these 3 headers, set Zyte-Override-Headers
to a
comma-separated list of names of headers to override, e.g.
Zyte-Override-Headers: Accept,Accept-Encoding
.
Warning
Overriding headers can break Zyte API ban avoidance.
Zyte-Session-ID#
Sets session.id for a client-managed session.
Invalid request headers#
The following headers are not allowed, and any request with one or more of them will result in an HTTP 400 response:
Client-IP
Cluster-Client-IP
Forwarded-For
True-Client-IP
Via
X-Client-IP
X-Forwarded
X-Forwarded-For
X-Forwarded-Host
X-Host
X-Original-URL
X-Originating-IP
X-ProxyUser-IO
X-ProxyUser-IP
X-Remote-Addr
X-Remote-IP
Response headers#
Responses include some headers injected by Zyte API.
Note that the response body of unsuccessful responses is always the actual JSON response from the HTTP API that provides error details.
Zyte-Error#
The presence of this header indicates that the response was unsuccessful.
It’s value should be ignored and not relied upon, as it is an internal error ID subject to change at any time.
Zyte-Error-Title#
A short summary of the problem type. Written in English and readable for engineers, usually not suited for non-technical stakeholders, and not localized.
It matches the title
JSON field of the error response.
Zyte-Error-Type#
A URI reference that uniquely identifies the problem type, only in the context of the provided API.
Opposed to the specification in RFC-7807, it is neither recommended to be dereferencable and point to human-readable documentation nor globally unique for the problem type.
It matches the type
JSON field of the error response.
Zyte-Request-ID#
A unique identifier of the request.
When reporting an issue about the outcome of a request to our Support team, please include the value of this response header when possible.
HTTPS proxy#
Tip
The main endpoint works both for HTTP and HTTPS URLs, you do not need an HTTPS proxy interface to access HTTPS URLs.
You can use the api.zyte.com:8014
endpoint for an HTTPS proxy interface,
provided your tech stack supports HTTPS proxies and you have installed
our CA certificate:
curl \
--proxy https://api.zyte.com:8014 \
--proxy-user YOUR_API_KEY: \
--compressed \
https://toscrape.com
const HttpsProxyAgent = require('https-proxy-agent')
const httpsAgent = new HttpsProxyAgent.HttpsProxyAgent('https://YOUR_API_KEY:@api.zyte.com:8014')
const axiosDefaultConfig = { httpsAgent }
const axios = require('axios').create(axiosDefaultConfig)
axios
.get('https://toscrape.com')
.then((response) => {
const httpResponseBody = response.data
console.log(httpResponseBody)
})
import requests
response = requests.get(
"https://toscrape.com",
proxies={
scheme: "https://YOUR_API_KEY:@api.zyte.com:8014"
for scheme in ("http", "https")
},
)
http_response_body: bytes = response.content
print(http_response_body.decode())