Zyte API error handling#
While using Zyte API, you may get the following type of responses:
Successful responses#
Zyte API sends a successful response, i.e. a response with an HTTP status code of 200, when that response provides the requested data, ban-free.
A Zyte API response is considered successful even in the following scenarios:
The response from the target website is a bad response for a reason other than a ban.
Some browser actions have failed.
The webpage content does not match the specified automatic extraction property.
The webpage content does not match what you get with an HTTP client program or library like
curl
.
Bad website responses#
When a website sends a response with an HTTP status code other than 200, and that response is not the result of a ban, Zyte API sends that response to you.
For example, if you send a request to https://toscrape.com/not-found, you get a
successful response from Zyte API, where the value of the
statusCode response field is 404
.
Browser action failures#
Browser action failures, e.g. timeouts, or bad responses received during action execution, e.g. after clicking a button, do not cause Zyte API to send an unsuccessful response.
Zyte API returns your requested output (e.g. browser HTML, screenshot) the way it was after all actions were executed or the time to run actions run out, and the Zyte API response includes an actions field with details about the outcome of each action.
Automatic extraction mismatches#
Mismatches between the webpage content and the specified automatic extraction request field do not cause Zyte API to send an unsuccessful response.
Zyte API returns your requested output, including the metadata.probability
field that indicates the probability that the specified automatic extraction
property matches the webpage content.
HTTP client mismatches#
Some websites might return a different response to a browser than they send to
a different type of HTTP client, like curl
.
Zyte API aims to provide the same responses that a browser would get, i.e. those responses you can see in the browser developer tools for network monitoring.
If you specifically want the same response that a specific non-browser HTTP client gets, you can try setting the User-Agent header accordingly. However, some websites can tell browsers from non-browser HTTP clients, and since Zyte API aims to behave like a browser, getting the same response as a non-browser HTTP client might not be possible.
Rate-limiting responses#
Note
You are not charged for rate-limiting responses.
Tip
scrapy-zyte-api and python-zyte-api handle rate limiting automatically.
Zyte API may send a response with an HTTP status code of 429 or 503 for rate-limiting purposes.
The right way to handle any rate-limiting response is to retry its request as many times as needed until you get a non-rate-limiting response.
Rate-limiting responses are sent in the following scenarios:
You have exceeded your API key rate limit.
{"status": 429, "type": "/limits/over-user-limit"}
Zyte API keys are allowed, by default, a maximum of 500 requests per minute (RPM). You may request a higher limit for free.
There is no concurrency limit. In practice, concurrency is limited by your RPM limit and your response time. Given a 500 RPM limit, 5 s response times allow ~42 concurrent requests (500*5/60), while 200 ms response times allow ~2 concurrent requests (500*0.2/60).
When making an efficient use of Zyte API, getting a small percentage of rate-limiting responses due to exceeding your API key rate limit is expected and normal.
The global rate limit for the target website has been exceeded.
{"status": 429, "type": "/limits/over-domain-limit"}
Each website has a different, website-specific rate limit. These limits exist to avoid causing issues on websites. If this is an issue for you, reach out to us.
You have exceeded your account rate limit for the target website.
{"status": 429, "type": "/limits/over-org-domain-limit"}
Each Zyte API account also has its own rate limit for each website, so that no single account can hog traffic to any given website. If this is an issue for you, reach out to us.
Zyte API automatic extraction is overloaded.
{"status": 503, "type": "/extractor/over-global-limit"}
Zyte API is overloaded.
{"status": 503, "type": "/limits/over-global-limit"}
See also
Unsuccessful responses#
Note
You are not charged for unsuccessful responses.
Zyte API sends an unsuccessful response, i.e. a response with an HTTP status code of 400 or higher that is not a rate-limiting response, when Zyte API cannot provide the requested data.
Zyte API sends unsuccessful responses in the following scenarios:
There has been a download error, either temporary or permanent, or a service error
Your request is invalid
Your account has been suspended
Temporary download errors#
Tip
By default, scrapy-zyte-api and python-zyte-api automatically retry temporary download errors up to 3 times before giving up.
Zyte API sends an HTTP 520 response when a temporary error, usually a ban that could not be avoided in a timely fashion, prevents downloading the requested URL.
{"status": 520, "type": "/download/temporary-error"}
On certain websites, it is normal to get these responses sometimes. When you do, retry your request until you get a successful response.
We closely monitor the success rate for the most popular websites, but less popular websites may slip under our radar. If you get this response too often, follow Maximizing your success rate to discard issues in your request parameters. If that does not help, reach out to our expert anti-ban team.
Permanent download errors#
Zyte API sends an HTTP 521 response when a permanent error prevents downloading the requested URL.
{"status": 521, "type": "/download/internal-error"}
You can wait for us to address the issue, or ask to be notified when the issue is resolved.
Tip
For some websites, Zyte API may sometimes accidentally flag some temporary download errors as permanent download errors. If sending the same Zyte API request multiple times returns an HTTP 521 error only sometimes, you might want to treat HTTP 521 errors as HTTP 520 errors for the target website, i.e. retry them automatically, until we resolve your issue report.
Service errors#
If Zyte API sends an HTTP 500 response, it means that the request took too long or that there was an unexpected issue in Zyte API.
{"status": 500, "type": "/server/timed-out"}
{"status": 500, "type": "/server/internal"}
If the issue persists, feel free to ask to be notified when the issue is resolved.
Invalid requests#
Zyte API may send a response with an HTTP status code of 400, 401, 422 or 451 if there is an error in your request, including:
You are using invalid parameters or parameter values.
{"status": 400, "type": "/request/invalid"}
Your request body is invalid JSON.
{"status": 400, "type": "/request/invalid-json"}
Your API key is not properly specified, e.g. missing or malformed.
{"status": 401, "type": "/auth/not-valid"}
Your API key is unknown, e.g. it might be the wrong API key.
{"status": 401, "type": "/auth/key-not-found"}
You are using incompatible parameters, such as mixing browserHtml and httpResponseBody.
{"status": 422, "type": "/request/unprocessable"}
You are targeting a domain that Zyte API does not allow.
{"status": 451, "type": "/download/domain-forbidden"}
Account suspension#
Zyte API sends an HTTP 403 response if your Zyte API account is suspended.
{"status": 403, "type": "/auth/account-suspended"}
Causes of account suspension include:
Reaching the end of your trial.
Setting a spending limit lifts your account suspension immediately.
Reaching your spending limit.
Increasing your spending limit lifts your account suspension immediately.
Retrying requests#
Tip
scrapy-zyte-api and python-zyte-api handle retries for rate limiting and temporary download errors automatically.
You should automatically retry requests that get a rate-limiting or a temporary download error response.
When retrying requests automatically, please use an exponential backoff algorithm: wait for some random time before every retry, and use an exponentially longer time the more retries you have used for any given request.
For rate-limiting responses, you should retry forever, but use generous retry times. For unsuccessful responses, you can use lower retry times, but you should cap the number of retries per request, to prevent an infinite loop from causing your code to hang.
These are some example ranges of random wait times for different scenarios:
Retry |
Rate-limiting responses |
Unsuccessful responses |
---|---|---|
1st |
20-40 seconds |
3-9 seconds |
2nd |
20-40 seconds |
3-11 seconds |
3rd |
30-38 seconds |
3-15 seconds |
4th |
30-46 seconds |
3-23 seconds |
5th |
30-62 seconds |
3-39 seconds |
6th |
30-94 seconds |
3-62 seconds |
7th |
30-158 seconds |
3-62 seconds |
8th |
30-286 seconds |
3-62 seconds |
9th |
30-542 seconds |
3-62 seconds |
10th+ |
30-630 seconds |
3-62 seconds |
Ban handling#
A banned response is a response from a website that is different from the response anyone would get in a browser.
Zyte API handles banned responses automatically and transparently where possible, so that you never get a banned response.
For a given request, if Zyte API cannot avoid a banned response in a reasonable time, Zyte API sends you a temporary download error response, for which you are not charged. You can then retry your request as many times as needed until Zyte API succeeds.
We monitor and proactively work on improving the success rate and response times of Zyte API for the most popular websites, but less popular websites may slip under our radar. If you encounter too many bans, please reach out to our expert anti-ban team.
If you ever get a successful Zyte API response that you believe is the result of a ban, please report it to our expert anti-ban team.
Zyte API uses many different techniques to avoid bans. However, Zyte API does not log into websites automatically. Zyte API cannot automatically get you data that is always locked behind a user login.
By default, Zyte API may use residential IP addresses or CAPTCHA-management solutions. You may request Zyte API not to use these features for your requests, e.g. for compliance reasons. You can also disable residential IP address usage per request, see IP type.
Maximizing your success rate#
Some request parameters can lower the success rate of Zyte API on some websites. To maximize your success rate, i.e. minimize the rate of temporary download error responses that you get:
Ensure your URL is valid
Make sure your URL works when you use it in a browser set in incognito mode. Mind that some URLs may stop working after their website changes.
Also ensure that query string parameters, parameter order and values match what you get when you access that webpage manually from a browser.
For complex URLs, you can alternatively use a browser request with actions to get to the target URL from a simpler URL.
Do not set a Referer header
It is often best not to set any value for the
Referer
request header, unless you are building an API request that expects it.If you set the header in the past because it was improving your success rate, but your success rate has lowered now, see if removing the header makes a difference.
Set headers for API requests
When targeting API endpoints, set the right request headers and cookies, i.e. those your browser sets when sending the same request.
Mind that some of those values, such as session cookies or CSRF tokens, might expire with time or need to be read from responses to earlier requests. You may also need sessions to maximize your success rate in request chains.
Alternatively, consider using a browser request instead. You can use actions if needed to trigger specific API requests, and either read the result from the browser HTML, if the API response data is loaded onto the webpage, or read the actual API response with network capture.