Migrating from Smart Proxy Manager to Zyte API#

Learn how to migrate from Smart Proxy Manager to Zyte API.

Key differences#

The following table summarizes the feature differences between both products:

Feature

Smart Proxy Manager

Zyte API

API

Proxy

HTTP

Ban avoidance

Good

Great

Residential proxies

Add-on

Automatic

Cookie handling

Advanced

Limited

Session management

Advanced

Limited

Smart geolocation

No

Yes

Browser HTML

No

Yes

Screenshots

No

Yes

Browser actions

No

Yes

User throttling

Concurrency-based

RPS-based

See also Parameter mapping below for some additional, lower-level differences.

Ban avoidance#

Smart Proxy Manager does a good job at avoiding bans through proxy rotation, ban detection, retrying algorithms, and browser mimicking through browser profiles.

Zyte Data API improves on it by using an actual browser, if that is required to prevent bans on a particular website.

Zyte API also supports simulating human interaction.

Residential proxies#

Zyte API supports both data center and residential IP addresses, and automatically chooses the right type of IP address as needed.

Session management#

Smart Proxy Manager supports advanced session management: you can create and reuse sessions that retain the same IP address and cookies.

Zyte API does not support session management at the moment, it is a planned feature. However, for some scenarios using browser actions can make your code more future-proof, less likely to cause a ban, and require a single request on your end where you would need multiple requests otherwise.

Geolocation#

Both products let you choose which country of origin to use for a request.

However, with Zyte API you usually do not need to manually choose which country of origin to use for each request, because Zyte API automatically chooses the best country of origin based on the target website.

Smart Proxy Manager does support a richer list of countries of origin that you can set manually. However, if you let Zyte API choose the right country of origin, it can use additional countries not available for manual override.

For more information, see Set a country of origin.

Authentication#

You cannot use your Smart Proxy Manager API key for Zyte API, you need to get a separate API key to use Zyte API.

API migration#

The main challenge is switching from a proxy API to an HTTP API.

Because Zyte API has a wider range of features and can hence provide a richer output, you need JSON parsing, and in some cases base64-decoding, to get your data.

For example, this is a basic request using Smart Proxy Manager:

using System.Net;
using System.Text;

var proxy = new WebProxy("http://proxy.zyte.com:8011", true);
proxy.Credentials = new NetworkCredential("YOUR_API_KEY", "");

var request = (HttpWebRequest)WebRequest.Create("https://toscrape.com");
request.Proxy = proxy;
request.PreAuthenticate = true;
request.AllowAutoRedirect = false;
request.ServerCertificateValidationCallback += (sender, certificate, chain, sslPolicyErrors) => true;

var response = (HttpWebResponse)request.GetResponse();
var stream = response.GetResponseStream();
var reader = new StreamReader(stream);
var httpResponseBody = reader.ReadToEnd();
reader.Close();
response.Close();
curl \
    --proxy proxy.zyte.com:8011 \
    --proxy-user YOUR_API_KEY: \
    https://toscrape.com
const axios = require('axios')

axios
  .get(
    'https://toscrape.com',
    {
      proxy: {
        host: 'proxy.zyte.com',
        port: 8011,
        auth: {
          username: 'YOUR_API_KEY',
          password: ''
        }
      }
    }
  )
  .then((response) => {
    const httpResponseBody = response.data
  })
<?php

$client = new GuzzleHttp\Client();
$response = $client->request('GET', 'https://toscrape.com', [
    'proxy' => 'http://YOUR_API_KEY:@proxy.zyte.com:8011',
]);
$http_response_body = (string) $response->getBody();
import requests

response = requests.get(
    "https://toscrape.com",
    proxies={
        f"{scheme}": f"http://YOUR_API_KEY:@proxy.zyte.com:8011/"
        for scheme in ("http", "https")
    },
    verify="/path/to/zyte-smartproxy-ca.crt",
)
http_response_body: bytes = response.content
from scrapy import Request, Spider


class ToScrapeSpider(Spider):
    name = "toscrape_com"
    start_urls = ["https://toscrape.com"]

    def parse(self, response):
        http_response_body: bytes = response.body

And this is an identical request using Zyte API:

using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;

HttpClientHandler handler = new HttpClientHandler()
{
    AutomaticDecompression = DecompressionMethods.All
};
HttpClient client = new HttpClient(handler);

var apiKey = "YOUR_API_KEY";
var bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(apiKey + ":");
var auth = System.Convert.ToBase64String(bytes);
client.DefaultRequestHeaders.Add("Authorization", "Basic " + auth);

client.DefaultRequestHeaders.Add("Accept-Encoding", "br, gzip, deflate");

var input = new Dictionary<string, object>(){
    {"url", "https://toscrape.com"},
    {"httpResponseBody", true}
};
var inputJson = JsonSerializer.Serialize(input);
var content = new StringContent(inputJson, Encoding.UTF8, "application/json");

HttpResponseMessage response = await client.PostAsync("https://api.zyte.com/v1/extract", content);
var body = await response.Content.ReadAsByteArrayAsync();

var data = JsonDocument.Parse(body);
var base64HttpResponseBody = data.RootElement.GetProperty("httpResponseBody").ToString();
var httpResponseBody = System.Convert.FromBase64String(base64HttpResponseBody);
input.json#
{"url": "https://toscrape.com", "httpResponseBody": true}
curl \
    --user YOUR_API_KEY: \
    --header 'Content-Type: application/json' \
    --data @input.json \
    --compressed \
    https://api.zyte.com/v1/extract \
| jq --raw-output .httpResponseBody \
| base64 --decode \
> output.html
input.jsonl#
{"url": "https://toscrape.com", "httpResponseBody": true}
zyte-api input.jsonl 2> /dev/null \
| jq --raw-output .httpResponseBody \
| base64 --decode \
> output.html
import com.google.common.collect.ImmutableMap;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Map;
import org.apache.hc.client5.http.classic.methods.HttpPost;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.CloseableHttpResponse;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.ContentType;
import org.apache.hc.core5.http.HttpEntity;
import org.apache.hc.core5.http.HttpHeaders;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.io.entity.EntityUtils;
import org.apache.hc.core5.http.io.entity.StringEntity;

class Example {
  private static final String API_KEY = "YOUR_API_KEY";

  public static void main(final String[] args)
      throws InterruptedException, IOException, ParseException {
    Map<String, Object> parameters =
        ImmutableMap.of("url", "https://toscrape.com", "httpResponseBody", true);
    String requestBody = new Gson().toJson(parameters);

    HttpPost request = new HttpPost("https://api.zyte.com/v1/extract");
    request.setHeader(HttpHeaders.CONTENT_TYPE, ContentType.APPLICATION_JSON);
    request.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
    request.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
    request.setEntity(new StringEntity(requestBody));

    try (CloseableHttpClient client = HttpClients.createDefault()) {
      try (CloseableHttpResponse response = client.execute(request)) {
        HttpEntity entity = response.getEntity();
        String apiResponse = EntityUtils.toString(entity, StandardCharsets.UTF_8);
        JsonObject jsonObject = JsonParser.parseString(apiResponse).getAsJsonObject();
        String base64HttpResponseBody = jsonObject.get("httpResponseBody").getAsString();
        byte[] httpResponseBodyBytes = Base64.getDecoder().decode(base64HttpResponseBody);
        String httpResponseBody = new String(httpResponseBodyBytes, StandardCharsets.UTF_8);
      }
    }
  }

  private static String buildAuthHeader() {
    String auth = API_KEY + ":";
    String encodedAuth = Base64.getEncoder().encodeToString(auth.getBytes());
    return "Basic " + encodedAuth;
  }
}
const axios = require('axios')

axios.post(
    'https://api.zyte.com/v1/extract',
    {
      url: 'https://toscrape.com',
      httpResponseBody: true
    },
    {
      auth: { username: 'YOUR_API_KEY' },
      headers: { 'Accept-Encoding': 'gzip, deflate' }
    }
  ).then((response) => {
    const httpResponseBody = Buffer.from(
      response.data.httpResponseBody,
      'base64'
    )
  })
<?php

$client = new GuzzleHttp\Client();
$response = $client->request('POST', 'https://api.zyte.com/v1/extract', [
    'auth' => ['YOUR_API_KEY', ''],
    'headers' => ['Accept-Encoding' => 'gzip'],
    'json' => [
        'url' => 'https://toscrape.com',
        'httpResponseBody' => true,
    ],
]);
$data = json_decode($response->getBody());
$http_response_body = base64_decode($data->httpResponseBody);
from base64 import b64decode

import requests

api_response = requests.post(
    'https://api.zyte.com/v1/extract',
    auth=('YOUR_API_KEY', ''),
    json={
        'url': 'https://toscrape.com',
        'httpResponseBody': True,
    },
)
http_response_body: bytes = b64decode(
    api_response.json()['httpResponseBody']
)
import asyncio
from base64 import b64decode

from zyte_api.aio.client import AsyncClient

async def main():
    client = AsyncClient()
    api_response = await client.request_raw(
        {
            'url': 'https://toscrape.com',
            'httpResponseBody': True,
        }
    )
    http_response_body: bytes = b64decode(
        api_response['httpResponseBody']
    )

asyncio.run(main())

In transparent mode, when you target a text resource (e.g. HTML, JSON), regular Scrapy requests work out of the box:

from scrapy import Spider


class ToScrapeSpider(Spider):
    name = "toscrape_com"
    start_urls = ["https://toscrape.com"]

    def parse(self, response):
        http_response_text: str = response.text

While regular Scrapy requests also work for binary responses at the moment, they may stop working in future versions of scrapy-zyte-api, so passing httpResponseBody is recommended when targeting binary resources:

from scrapy import Request, Spider


class ToScrapeSpider(Spider):
    name = "toscrape_com"

    def start_requests(self):
        yield Request(
            "https://toscrape.com",
            meta={
                "zyte_api_automap": {
                    "httpResponseBody": True,
                },
            },
        )

    def parse(self, response):
        http_response_body: bytes = response.body

See Zyte API usage for richer Zyte API examples, covering more scenarios and features.

There is no easy way to use Zyte API to drive requests from browser automation tools. If you are using Smart Proxy Manager as a proxy for a browser automation tool, consider using Zyte API for your browser automation needs instead. See Migrating from browser automation to Zyte API.

Parameter mapping#

The following table shows a mapping of Smart Proxy Manager request headers and their corresponding Zyte Data API parameters:

Replacing X-Crawlera-Profile and X-Crawlera-Profile-Pass#

The behavior of Zyte API is a middle ground between the desktop and pass values of X-Crawlera-Profile: browser-specific headers are always sent (unlike pass, which disables them altogether), but you can override them (unlike desktop, which forces them unless you use X-Crawlera-Profile-Pass), except for the User-Agent header, which cannot be overwritten. See Set request headers for more information.

Zyte API offers no alternative to the mobile value of X-Crawlera-Profile.