Migrating from ScrapingBee to Zyte API#

Learn how to migrate from ScrapingBee to Zyte API.

Feature comparison#

The following table summarizes the feature differences between both products:

Feature	ScrapingBee	Zyte API
Client software	Python, NodeJS	Python, Scrapy
Pricing	Fixed plans	Pay as you go Monthly commitment over $100
Ban avoidance	Manual, may increase costs	Automatic, no extra costs
Automatic extraction	Google SERP, custom LLM prompts	Standard schemas including Google SERP, custom LLM prompts, supports crawling
Geolocation	243 countries, no data center support	249 countries, data center support
Sessions	Client-managed only (5m)	Client-managed (15m) and server-managed
Actions	Basic only (9)	Basic (15), advanced, website-specific and custom
Screenshots	Yes, can target an element	Yes, cannot target an element
Body size limit	2 MB	10 MB
Custom headers	Yes	Only in HTTP requests, limited to `Referer` in browser requests, cannot disable ban-avoidance headers
Ad blocking	Yes	No
Resource blocking	Yes	No
Custom proxies	Yes	No
Server-side CSS/XPath selectors	Yes	No
Rate limiting	Concurrency-based	RPM-based
Usage API	Yes, up to 6 requests per second	Yes, up to 20 requests per second

Pricing#

ScrapingBee offers 4 plans with a fixed price per month, each with a fixed number of “credits” per month that you have to spend on that month or lose.

With Zyte API you pay only for what you use, up to a $100 monthly spending limit. If you need a higher spending limit, you must commit to paying half as monthly commitment, which you do not get back if you spend less during a month.

With ScrapingBee, HTTP requests cost 1 credit each, while browser requests cost 5 credits each. If you need to use residential IPs (“premium proxies”) to avoid bans, costs raise to 10 credits per HTTP request (10×) and 25 credits per browser request (5×). For scenarios where residential IPs do not avoid bans either, ScrapingBee offers special “stealth” proxies for browser requests at 75 credits per request (15×). ScrapingBee also charges 20 credits when targetting Google domains.

With Zyte API, request cost varies depending not only on the type of request (HTTP or browser), but also on the tier of the target website, which covers the cost of any tech that Zyte API may use to get you a ban-free response, including browser rendering and residential IPs. No extra cost for Google domains; not even for automatic extraction of SERP (serp).

Unless you are never using premium or stealth proxies, you are targetting mostly high-tier websites, and the number of credits per month that you need is close to those included in one of ScrapingBee‘s plans, Zyte API tends to be a cheaper choice.

For example, the $49 ScrapingBee plan includes 150k credits, i.e. 150k HTTP requests. For tier 1-2 websites (i.e. most websites), Zyte API is cheaper. And Zyte API can also be cheaper for higher-tier websites if you need fewer than 150k requests: 114k requests for tier 3, 70k requests for tier 2, and 39k request for tier 5.

Ban handling#

ScrapingBee makes it your responsibility to choose the right technologies (browser rendering, residential IPs, “stealth IPs”) to avoid bans, with the corresponding cost increase.

Zyte API automatically chooses the leanest technology possible transparently, without any extra cost, and automatically adapting to website changes.

Automatic extraction#

ScrapingBee supports automatic extraction through user-defined LLM prompts.

Zyte API automatic extraction provides automatic extraction for supported types and user-defined LLM prompts to extract additional fields. It also supports automatic crawling.

Both ScrapingBee and Zyte API support Google SERP extraction (serp).

Rate limiting#

ScrapingBee limits the number of concurrent requests that you can send, starting at 5 with the most basic plan.

Zyte API limits the number of requests per minute (RPM) that you can send. It is 750 by default for all Zyte API keys, but you can request a higher limit.

For services like these that support advanced features like browser rendering or automatic extraction, which usually increase response times, RPM rate limiting allows you to maintain your throughput regardless of which features you use thanks to unlimited concurrency, while concurrency-based limits slow down your crawls as you use features that make requests slower.

For example, assuming an HTTP request takes 2 seconds and a browser request takes 20 seconds, switching from HTTP requests to browser requests with ScrapingBee would make your crawl 10 times slower, while Zyte API would allow you to maintain a similar crawl speed by using more concurrent requests to make up for the response time increase.

Migrating#

The main differences between the HTTP APIs of ScrapingBee and Zyte API are how request parameters are defined and how the response is encoded.

In ScrapingBee, you send a GET request, and you specify parameters in the URL query string, URL-encoded, e.g.

curl "https://app.scrapingbee.com/api/v1/?api_key=YOUR_ZYTE_API_KEY&url=https%3A%2F%2Ftoscrape.com"

The API response body comes straight from the target website:

<!DOCTYPE html>
<html lang="en">
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
        <title>Scraping Sandbox</title>
        …

HTTP response headers and cookies from the target website are also received as regular headers and cookies, only prefixed with Spb-.

Spb-Content-Encoding: br
Spb-Content-Type: text/html

In Zyte API, you send a POST request, and you specify parameters in the request body as JSON, e.g.

Tip

Same as ScrapingBee, Zyte API offers a proxy mode that you can use instead of the HTTP API if it makes things simpler.

curl \
    --user YOUR_ZYTE_API_KEY: \
    --header 'Content-Type: application/json' \
    --data '{"url": "https://toscrape.com", "httpResponseBody": true, "httpResponseHeaders": true}' \
    --compressed \
    https://api.zyte.com/v1/extract

The API response is a JSON object with all the response data from the target website:

{
    "url": "https://toscrape.com/",
    "statusCode": 200,
    "httpResponseBody": "PCFET0NUWVBFIGh0bWw+CjxodG1sIGxhbmc9ImVuIj4KICAgIDx…",
    "httpResponseHeaders": [
        {
            "name": "content-type",
            "value": "text/html"
        },
        {
            "name": "content-encoding",
            "value": "br"
        }
    ]
}

Note

httpResponseBody is base64-encoded to support binary responses, like images or PDF files.

Once you understand how to migrate a simple request like the one above, you can migrate any other request the same way, replacing ScrapingBee parameters with Zyte API counterparts.

Parameter mapping#

ScrapingBee	Zyte API
(default)	httpResponseBody, httpResponseHeaders
`api_key`	Use basic authentication
`url`	url
`render_js`	browserHtml
`js_scenario`	See below
`wait`	`waitForTimeout` action (see below)
`wait_for`	`waitForSelector` action (see below)
`wait_browser`	`waitForNavigation` action (see below)
`block_ads`	Not supported
`block_resources`	Not supported
`viewport_width`	viewport
`window_height`	viewport
`premium_proxy`	ipType=residential (not required to avoid bans)
`country_code`	geolocation (does not require ipType=residential)
`stealth_proxy`	N/A, ban avoidance is a transparent feature
`own_proxy`	Not supported
`forward_headers`	customHttpRequestHeaders, requestHeaders
`forward_headers_pure`	Not supported
`ai_query`	customAttributes
`ai_selector`	Not supported
`ai_extract_rules`	customAttributes
`extract_rules`	Not supported
`screenshot`	screenshot
`screenshot_selector`	Not supported
`screenshot_full_page`	screenshotOptions.fullPage=true
`json_response`	See Network capture
`return_page_source`	Not supported (use httpResponseBody if you are only using browser rendering to avoid bans)
`scraping_config`	Not supported
`session_id`	session.id (must be UUID4)
`timeout`	Not supported
`cookies`	requestCookies
`device`	device
`custom_google`	N/A
`transparent_status_code`	N/A, Zyte API returns the response or not based on whether or not it is a ban, not based on the status code

Action mapping#

ScrapingBee allows defining a sequence of browser actions through the "instructions" JSON array of the js_scenario parameter. For example:

{
    "instructions": [
        {"click": "#buttonId"}
    ]
}

Which URL-encoded would become:

js_scenario=%7B%22instructions%22%3A+%5B%7B%22click%22%3A+%22%23buttonId%22%7D%5D%7D

The Zyte API equivalent is the actions field. The following is a matching example:

{
    "actions": [
        {
            "action": "click",
            "selector": {
                "type": "css",
                "value": "#buttonId"
            }
        }
    ]
}

These are ScrapingBee actions and their Zyte API counterparts:
click: click
evaluate: evaluate
fill: type
infinite_scroll: scrollBottom
scroll_x: scrollTo
scroll_y: scrollTo
wait: waitForTimeout
wait_for: waitForSelector
wait_for_and_click: waitForSelector, click

The following Zyte API actions are not supported by ScrapingBee:
doubleClick
goto
hide
hover
keyPress
reload
searchKeyword
select
setLocation
waitForRequest
waitForResponse

Zyte API also supports custom actions.