How to deal with bans or 503 response from Zyte Smart Proxy manager?¶
When Zyte Smart Proxy Manager (formerly Crawlera) gets a ban from a target website, it automatically retries the request from another proxy IP. By default, Smart Proxy Manager re-tries 5 times to retrieve the content, and if it still fails, it generates the status code 503. Zyte constantly refreshes the proxy pool and configures specific settings for websites that are difficult to crawl. If you get significant 503 bans in spite of these features, you can consider the following approaches to improve your crawl rates. Please note that a small number of bans are expected for any crawl as Smart Proxy Manager adapts to use the best settings for each site. The responses with 503 codes will not be billed to you.
You will see this HTTP response header when Smart Proxy Manager generates a 503 after retries
Your client can retry the request after a wait time that you can configure in your client, or reduce the crawl rate to see if there are improvements. Smart Proxy Manager can return 503s with busy domains such as amazon and google, even after trying many outgoing nodes. The only thing we can do is retry.
You can use the following best practices to reduce the occurrences of bans:
1. Try using different headers that provide you wit more options to circumvent bans to ensure better performance and higher success rate. You can find more information on profile headers in Using Desktop and Mobile profiles using Zyte Smart Proxy Manager
2. If cookies are getting handled on the client-side, you need to send X-Crawlera-Cookies to disable cookies on the Smart Proxy Manager side.
3. If mobile apps are incorporated, you should use Smart Proxy Manager mobile profiles (by
X-Crawlera-Profile: mobile header) without sessions. Rotating (changing through a
list of) user agents is the best practice that can be followed.
If the above steps are not helpful to you please reach out to our support team.