Q. Can I try Zyte Smart Proxy Manager (formerly Crawlera) before subscribing?
A. With our Starter, Basic and Advanced plans you can try Smart Proxy Manager for 14 days or
10,000 requests (whichever comes first). After that you will be billed automatically for a full
month, the request quota will reset to 0 and consequently, your billing cycle will start at the
time your trial ends.
Q. Can I use the Smart Proxy Manager service with my own crawler without using Scrapy, Scrapy Cloud, or any other Zyte service?
A. Definitely. Smart Proxy Manager is a standalone service that can be used with any crawler or
HTTP client, independently of the rest of the Zyte platform.
Q. Can we have two Zyte Smart Proxy Manager plans under one organization?
A. It’s not recommended to have multiple SPM Plans under one organization. What is recommended
is to create another Scrapy Cloud organization and subscribe to the other CXX plan there. This
will ensure that both plans are active independently to suit your use-case (separating
production and development accounts, for instance).
Q. Can I try Zyte Smart Proxy Manager with a trial account?
A. With our Starter, Basic and Advanced plans you can try Zyte Smart Proxy Manager for 14 days
or 10,000 requests (whichever comes first).
After that you will automatically be billed for a full month, the request quota will reset to
0 and consequently, your billing cycle will start at the time your trial ends.
Q. Why did my Zyte Smart Proxy Manager account get suspended?
A. Zyte Smart Proxy Manager (formerly Crawlera) accounts are usually suspended when a customer
has reached the monthly requests limit allowed on their plan, before the end of the billing
cycle (the billing cycle details are available at the bottom of the
Billing Page, in the Monthly Billing section, and
under Requests Breakdown section on Smart Proxy Manager’s
Usage Stats page). Customers are notified when
the request limit reaches 80% of the monthly quota and suspended when they exceed it.
The limits for each plan are available on the Smart Proxy Manager promotional page. If your account has been suspended, you will have to purchase a plan higher than the one you are currently on. Although the billing page will allow you to choose a smaller plan or the same as the one you are currently on, this will not reactivate the Smart Proxy Manager account.
Alternatively, you may choose to not reactivate the account for the current billing period. In that case, the Smart Proxy Manager account with the current plan will be reactivated automatically at the start of your next billing cycle.
Note
After the account suspension, Smart Proxy Manager will start returning 401 errors. On a Scrapy Cloud job’s Stats tab this will be registered as follows:
crawlera/response/error/user_suspended
1
crawlera/response/status/401
1
Q. How do I provide my Credit Card / PayPal information?
Q. Why have I been charged more than the listed price for a plan?
A. For users based out of Europe, a VAT is applied to invoices if VAT number is not provided.
This results in an increase in the listed price. If a VAT number is provided, no VAT fee will
be applied, except if you are based in Ireland, where VAT always applies.
For any queries related to billing please contact our
support team.
Q. Can I put my account on hold and pause the subscription?
A. If you would like to suspend your services, you may do so using your billing page, inside
your organization’s dashboard. Canceling subscriptions won’t cancel your account. Instead, it
will take you back to the free Scrapy Cloud trial, and suspend your Zyte Smart Proxy Manager
(formerly Crawlera) accounts until you decide to subscribe again.
Keep in mind that any data contained in your Scrapy Cloud projects that are past the expiration
date of a free account will be removed once you downgrade to the free plan.
Q. Will my API Key change when I upgrade or downgrade Zyte Smart Proxy Manager plans?
A. Zyte Smart Proxy Manager (formerly Crawlera) plans can be upgraded or downgraded anytime and
this would not affect the existing API key. API keys are mapped to the account and not to the
plans. Any upgrades or downgrades would result in a change in moncurrency and monthly request limits.
A. Please note that it takes a few minutes for Zyte Smart Proxy Manager (formerly Crawlera) instance to be provisioned once payment has been made. When the Smart Proxy Manager instance is being provisioned, you can see the tag “Provisioning” here: <https://app.zyte.com/o/>. When provisioning is complete, you can find the API key and setup instructions at: https://app.zyte.com/o/smart-proxy-manager/setup. If the instance is not provisioned after 15 minutes, please contact our Support team.
Q. How does Smart Proxy Manager behave in terms of request headers?
A. Smart Proxy Manager will make sure it’s sending the appropriate headers in a request - Meaning it won’t send headers such as x-forwarded-for, or any other proxy-related headers, in order to not appear as a proxy to the target.
You can also control its behavior using the X-Crawlera headers, which allow you to appear as certain user-agents, use cookies, and more. You can read more about the X-crawlera headers here.
Q. How to use Zyte Smart Proxy Manager with headless browsers?
Q. Can I Restrict Proxy Regions to States or Cities?
A. Basic and Advanced Zyte Smart Proxy Manager (formerly Crawlera) plans are supplied with
datacenter proxies, and thus only support country-level region restriction. To learn more about this check out Restricting Zyte Smart Proxy Manager IPs to a specific region.
The city-level restriction is possible with residential proxies, provided under
custom Enterprise plans and as add-on for any platform plans (Starter subscription and above).
You can learn more about residential proxies from the Residential page.
If interested in the latter, please contact Sales
through the form on the website.
Q. Can I Use Zyte Smart Proxy Manager in Applications Requiring Username/Password Pair?
A. It depends on the application in question, but in most cases, the following configuration works:
A. In Basic and Advanced plans, Zyte Smart Proxy Manager discards cookies to protect customer
privacy and optimize performance to the Smart Proxy Manager sites, because cookies are often
used as a detection mechanism by Antibot systems.
In Enterprise and the legacy plans (C10, C50, C100, C200) it works as follows:
Smart Proxy Manager manages cookies for you by default and retains them for up to 15 minutes
since the last request. Smart Proxy Manager keeps separate groups of cookies per outgoing node,
and as a result consecutive requests will almost always have different cookies so if you need
to use cookies for things like authentication, then you will want to manage them yourself.
To store and manage cookies yourself you will need to disable Smart Proxy Manager cookie
handling with the X-Crawlera-Cookies:disable header. If cookie handling is not disabled,
Smart Proxy Manager will discard the cookies and send its own instead. Check out
X-Crawlera-Cookies for more detail.
Consider also using Smart Proxy Manager sessions with cookies in order to use the same IP.
Check out Sessions for more detail.
A. You are receiving this error because you are using Zyte Smart Proxy Manager service for
requesting content that is larger than 500MB. To improve the stability and performance of Smart
Proxy Manager, we have placed a file size limit of 500MB. If you request content that is larger
than 500MB through Smart Proxy Manager, you will receive an error with status code
541 that implies that the response size is too big.
Q. 523 Response “Domain Forbidden” - what does it mean?
A. Some websites have strict policies about crawling and don’t allow requests for data
extraction purposes. Therefore, they are not accessible through requests made using Zyte Smart
Proxy Manager (formerly Crawlera).
Q. Many failed requests with 429 code. What does it mean?
A. For each Zyte Smart Proxy Manager (formerly Crawlera) plan there is a limit to Concurrency
(Requests that can be made parallel at a time). Concurrency is cumulative of all domains
crawled at a time and also requests through different Smart Proxy Manager accounts in the
Organization.
Hence when the concurrency is more than the limit of the plan, Smart Proxy Manager gives the
error message of 429 (Parallel Connection limit reached). You would need to lower the
concurrency or upgrade to higher plans to have higher concurrency.
Q. Why do I get a bad_proxy_auth error while making the request using Zyte Smart Proxy Manager?
A. To use Zyte Smart Proxy Manager (formerly Crawlera), we need to authenticate the Smart Proxy Manager user. This authentication is done using the Smart Proxy Manager API key. This is different than the Zyte API key.
Q. Where can I monitor my Smart Proxy Manager usage?
A. Go to the Zyte dashboard and select Smart Proxy Manager Overview or a specific account you
would like to zoom in. If you click on a user, you will be able to review the number of requests
per day/month for that user.
Note: Recent Requests section is not populated in real-time. Newly created accounts could be
presented with Nousagedetectedsofar message, even after the user has started sending
requests. Please check the page later to give the dashboard some time to catch up on the logs.
Q. How do I measure Smart Proxy Manager’s speed for a particular domain?
A. You can use the crawlera-bench tool. Check the GitHub page for more information on how to use it. Note that you would be consuming Smart Proxy
Manager traffic when using the crawlera-bench tool.
Q. Why are requests slower through Smart Proxy Manager than other proxies?
A. If you’re using your own proxies, you may notice a discrepancy in speed between using your own
proxies and using them with Smart Proxy Manager. This is because Smart Proxy Manager throttles
requests by introducing delays to avoid being banned on the target website.
These delays can differ depending on the target domain, as some popular sites have more rigorous
anti-scraping measures than others. Throttling also helps prevent inadvertently bringing down the
target website should it lack the resources to handle a large volume of requests.
Q. How to increase the speed of Job running on Scrapy Cloud with Zyte Smart Proxy
Manager?
A. Zyte Smart Proxy Manager is a proxy rotator which helps in avoiding bans from the target
website by adjusting delays, handling cookies and managing IPs. When Smart Proxy Manger is
integrated with spider running on Scrapy Cloud it can lead to a slow down of jobs as it will
adjust the delays to ensure crawling is not too fast and does not overburden the target server.
Hence, reducing the risk of recognition and bans.
But with Smart Proxy Manager we can control “Concurrency” which is the “Number of requests that
can be made in parallel”. This concurrency is based on the subscribed Smart Proxy Manager Plan.
To increase the Crawl speed you can increase the concurrency. Concurrency is calculated by
monitoring the cumulative request made to all domains from all Smart Proxy Manager accounts in
the Organization.
For Scrapy Cloud, Autothrottle is enabled by default which limits
the maximum number of concurrent requests sent to the same host domain. To increase the
concurrency you would need to disable Autothrottle and set Concurrency settings in
“settings.py” through UI as given in this article on Customizing Scrapy settings in Scrapy
Cloud.
Note
Each Smart Proxy Manager plan has limits on Concurrency and exceeding it would lead to errors with code 429.
Q. Why does Zyte Smart Proxy Manager slow down my Headless Browser?
A. When a page is opened in the browser with Zyte Smart Proxy Manager enabled,
all the page’s resources are routed through Smart Proxy Manager, i.e. images, stylesheets,
scripts, fonts, requests to CDNs, traffic analyzers, advertisements, etc. And since Smart Proxy
Manager adds throttling to each request (by design, to perform its main purpose of “crawling
politely” and avoiding bans), loading a page in the browser can indeed take a significant amount
of time.
To speed up the loading time (while also cutting on the plan usage quota), it’s recommended to
exclude the aforementioned static assets from being processed by Smart Proxy Manager. For instance,
in crawlera-headless-proxy tool,
AdBlock and
Direct Access features
can be enabled to filter out the unessential resources.
Q. How many IPs does Zyte Smart Proxy Manager have?
A. The number of IPs used by Zyte Smart Proxy Manager varies constantly from thousands to tens
of thousands. This number, however, is not very relevant as long as it stays above a certain
threshold, which allows us to crawl websites.
This question often arises with proxy providers because the number of IPs determines how fast
you can crawl a website. With Smart Proxy Manager, the number of IPs doesn’t matter because
Smart Proxy Manager (not the user) will throttle down requests so there is a global limit
upon which adding more IPs won’t increase the crawl speeds.
These limits are carefully tuned and always revised by Smart Proxy Manager engineers (and
automated algorithm) to make sure you can crawl as fast as possible without causing any
disruption to the websites.
Q. What’s the difference between Zyte Smart Proxy Manager and regular proxy
providers?
A. Zyte Smart Proxy Manager is a service to download web pages that supports an HTTP Proxy
API.
Standard proxy providers typically provide a pool of IPs running simple HTTP proxies (using
Squid or similar software) whereas Smart Proxy Manager downloads web pages, distributing
requests among many nodes, keeping track of which nodes are blacklisted (per domain), and
throttling them to make sure domains crawled politely, which minimizes the risk of getting
your crawler banned.
With proxy providers, you have to implement the throttling and blacklisting logic yourself.
With Smart Proxy Manager you only configure your crawler to download pages through Smart
Proxy Manager proxy and forget about throttling or implementing anti-ban policies. Smart
Proxy Manager enables you to crawl as fast as possible without causing any disruption to the
sites.
Q. Can I use Zyte Smart Proxy Manager from a web browser?
A. Zyte Smart Proxy Manager is slow on standard web browsers.
Unlike a standard proxy, Smart Proxy Manager is designed for crawling and throttling requests
speed to avoid users getting banned or imposing too much load on websites. This throttling
translates to a perception of Smart Proxy Manager being slow when tried in a web browser.
When you access a web page in a browser, you typically have to download many resources to
render it (images, CSS styles, JavaScript code, etc.) and each resource is a different request
that needs to be performed against the site. Compare this to crawling, where you typically
only download the page HTML source. Not only do you need to perform many requests to render a
single page, but web browsers also limit the number of concurrent requests performed to any
single site. All this translates to Smart Proxy Manager looking slow when tried from a web
browser. But this “slowness” is actually a feature for the purpose that Smart Proxy Manager is
intended to be used.
Q. Can I use Zyte Smart Proxy Manager for sites that require login?
A. It is important to be aware that scraping past the login of a site could raise legal
concerns. Our Terms of Service state
that you may not use our services to access, connect to, or retrieve data from sites that are
subject to terms of service that prohibit scraping. When logging in to a site, you may be
required to accept terms that prohibit scraping or put other limitations on your use. It is your
responsibility to review those terms and make sure you are abiding by any applicable laws and
Zyte’s Terms of Service. If you have
any questions about this, our legal and compliance team is always available to help and provide
guidance.
Q. Does Zyte Smart Proxy Manager count redirects as successful requests?
A. Yes, Zyte Smart Proxy Manager (formerly Crawlera) counts redirects as successful requests.
For example: if a user makes a request through Smart Proxy Manager to http://toscrape.com
and the response is 302 Found with a location, header to follow https://toscrape.com, and
the crawler follows that link through the Smart Proxy Manager account, it will get accounted for
two requests.
Smart Proxy Manager does not follow redirects by itself, it only executes a single HTTP
request, returning its HTTP response. So, if a 3XX response is received, the crawler should
decide whether to follow it or not.
Note that if you’re getting many 3XX redirects, you may be able to optimize your spider code to
save Smart Proxy Manager requests and make the crawl more efficient.
Q. Does Zyte Smart Proxy Manager solve captchas?
A. Zyte Smart Proxy Manager (formerly Crawlera) does not solve captchas.
However, Smart Proxy Manager will detect redirects to captcha pages (for most sites) and will
retry until it hits a clean page. Unsuccessful attempts won’t count towards the monthly quota.
This is based on a combination of basic checks (like the response code), some manual rules, and
automatic learning.
Q. Why do I see duplicate requests?
A. If you see duplicate requests and if the first request response code is 301 with HTTP and
the next one is 200 with HTTPS, then that means the requested website does not allow HTTP. This
is how websites with strict SSL work. To fix this please send requests to HTTPS instead of HTTP.