Warning

Zyte API is replacing Smart Proxy Manager. See Migrating from Smart Proxy Manager to Zyte API.

Sessions#

Warning

Region selection is not possible when using sessions API, therefore X-Crawlera-Region and X-Crawlera-Session headers are not compatible.

Sessions allow reusing the same outgoing IP across multiple requests.

Session Limits#

There is a default delay of 12 seconds between each request using the same IP. These delays can differ for more popular domains. If the requests per second limit is exceeded, further requests will be delayed for up to 15 minutes. Each request made after exceeding the limit will increase the request delay. If the request delay reaches the soft limit (120 seconds), then each subsequent request will contain X-Crawlera-Next-Request-In header with the calculated delay as the value.

Sessions expire 30 minutes after the last request sent through that session.

The maximum number of sessions that can be created at any point in time is 5000.

Session and retries#

When using sessions, retries are automatically disabled under the assumption that is not helpful to retry a request through the same outgoing IP. However, this behaviour can be overridden through the X-Crawlera-Max-Retries header. If this header is passed, Smart Proxy Manager will automatically retry, even inside sessions.

Using Sessions with the proxy API#

Sessions are managed using the X-Crawlera-Session header. To create a new session send:

X-Crawlera-Session: create

Smart Proxy Manager will respond with the session ID in the same header:

X-Crawlera-Session: <session ID>

From then onward, subsequent requests can be made through the same outgoing IP by sending the session ID in the request header:

X-Crawlera-Session: <session ID>

Another way to create sessions is using the /sessions endpoint:

curl -u <API key>: proxy.zyte.com:8011/sessions -X POST

This will also return a session ID which you can pass to future requests with the X-Crawlera-Session header like before. This is helpful when you can’t get the next request using X-Crawlera-Session.

Here is a code example that illustrates how to create sessions with Python Requests library:

import requests

def getSPMSession():
    r = requests.post(url="http://proxy.zyte.com:8011/sessions", auth=(MYAPIKEY, ""))
    return r.text

If an incorrect session ID is sent, Smart Proxy Manager responds with a bad_session_id error.

Sessions API#

List sessions#

Issue the endpoint List sessions with the GET method to list your sessions. The endpoint returns a JSON document in which each key is a session ID and the associated value is a outgoing IP.

Example:

curl -u <API key>: proxy.zyte.com:8011/sessions
{"1836172": "<OUTGOING_IP1>", "1691272": "<OUTGOING_IP2>"}

Delete a session#

Call the endpoint /sessions/SESSION_ID with the DELETE method in order to delete a session.

Example:

curl -u <API key>: proxy.zyte.com:8011/sessions/1836172 -X DELETE

Example using Python and Scrapy#

import scrapy

class ToScrapeCSSSpider(scrapy.Spider):
    name = "toscrape-css"

    def start_requests(self):
        yield scrapy.Request(
            'http://quotes.toscrape.com/',
            headers={'X-Crawlera-Session': 'create'}, # requesting new session
            callback=self.parse,
        )

    def parse(self, response):
        for quote in response.css("div.quote"):
            yield {
                'text': quote.css("span.text::text").extract_first(),
                'author': quote.css("small.author::text").extract_first(),
                'tags': quote.css("div.tags > a.tag::text").extract()
            }

        next_page_url = response.css("li.next > a::attr(href)").extract_first()
        if next_page_url is not None:
            session_id = response.headers.get('X-Crawlera-Session', '')
            yield scrapy.Request(
                response.urljoin(next_page_url),
                headers={'X-Crawlera-Session': session_id} # using the same session via id
            )