Warning
Zyte API is replacing Smart Proxy Manager. See Migrating from Smart Proxy Manager to Zyte API.
Sessions#
Warning
Region selection is not possible when using sessions API, therefore
X-Crawlera-Region
and X-Crawlera-Session
headers are not compatible.
Sessions allow reusing the same outgoing IP across multiple requests.
Session Limits#
There is a default delay of 12 seconds between each request using the same IP. These delays can differ for more popular domains. If the requests per second limit is exceeded, further requests will be delayed for up to 15 minutes. Each request made after exceeding the limit will increase the request delay. If the request delay reaches the soft limit (120 seconds), then each subsequent request will contain X-Crawlera-Next-Request-In header with the calculated delay as the value.
Sessions expire 30 minutes after the last request sent through that session.
The maximum number of sessions that can be created at any point in time is 5000.
Session and retries#
When using sessions, retries are automatically disabled under the assumption that is not helpful to retry a request through the same outgoing IP. However, this behaviour can be overridden through the X-Crawlera-Max-Retries header. If this header is passed, Smart Proxy Manager will automatically retry, even inside sessions.
Using Sessions with the proxy API#
Sessions are managed using the X-Crawlera-Session header. To create a new session send:
X-Crawlera-Session: create
Smart Proxy Manager will respond with the session ID in the same header:
X-Crawlera-Session: <session ID>
From then onward, subsequent requests can be made through the same outgoing IP by sending the session ID in the request header:
X-Crawlera-Session: <session ID>
Another way to create sessions is using the /sessions endpoint:
curl -u <API key>: proxy.zyte.com:8011/sessions -X POST
This will also return a session ID which you can pass to future requests with the X-Crawlera-Session header like before. This is helpful when you can’t get the next request using X-Crawlera-Session.
Here is a code example that illustrates how to create sessions with Python Requests library:
import requests
def getSPMSession():
r = requests.post(url="http://proxy.zyte.com:8011/sessions", auth=(MYAPIKEY, ""))
return r.text
If an incorrect session ID is sent, Smart Proxy Manager responds with a bad_session_id
error.
Sessions API#
List sessions#
Issue the endpoint List sessions with the GET
method to list your sessions.
The endpoint returns a JSON document in which each key is a session ID and the
associated value is a outgoing IP.
Example:
curl -u <API key>: proxy.zyte.com:8011/sessions
{"1836172": "<OUTGOING_IP1>", "1691272": "<OUTGOING_IP2>"}
Delete a session#
Call the endpoint /sessions/SESSION_ID
with the DELETE
method in order to delete a session.
Example:
curl -u <API key>: proxy.zyte.com:8011/sessions/1836172 -X DELETE
Example using Python and Scrapy#
import scrapy
class ToScrapeCSSSpider(scrapy.Spider):
name = "toscrape-css"
def start_requests(self):
yield scrapy.Request(
'http://quotes.toscrape.com/',
headers={'X-Crawlera-Session': 'create'}, # requesting new session
callback=self.parse,
)
def parse(self, response):
for quote in response.css("div.quote"):
yield {
'text': quote.css("span.text::text").extract_first(),
'author': quote.css("small.author::text").extract_first(),
'tags': quote.css("div.tags > a.tag::text").extract()
}
next_page_url = response.css("li.next > a::attr(href)").extract_first()
if next_page_url is not None:
session_id = response.headers.get('X-Crawlera-Session', '')
yield scrapy.Request(
response.urljoin(next_page_url),
headers={'X-Crawlera-Session': session_id} # using the same session via id
)