Zyte API reference documentation#

This is the complete reference documentation of the HTTP API of Zyte API.

For topic-based usage documentation, see Zyte API usage documentation.

All requests require basic authentication.
Use your Zyte API key as username, and no password.
For example, if your Zyte API key is foo, base64-encode foo: as Zm9vOg==
and send the Authorization header with value Basic Zm9vOg==:

Authorization: Basic Zm9vOg==

Web Data Extraction API (1.0.0)

Download OpenAPI specification:Download

URL: https://www.zyte.com

A single API for web scraping

Process a single URL, return the result

Process a single URL, return the result.

This endpoint blocks until the result is ready. It is intended for short-running operations.

At least one of the following request fields must be set to true:

All automatic extraction data types support performing extraction using either a browser request or an HTTP request. Choose which using the corresponding extractFrom option, e.g. productOptions.extractFrom when extracting a product.

When no option is specified, currently automatic extraction defaults to using a browser request, except for serp, where an HTTP request is used by default instead. In the future, however, the default value may depend on the target website.

When automatic extraction uses a browser request, it can be combined with any fields compatible with browserHtml, e.g. screenshot. When automatic extraction uses an HTTP request, it can be combined with any fields compatible with httpResponseBody. serp cannot be combined with any other fields besides serpOptions and url.

You cannot combine multiple automatic extraction request fields (e.g. product and productList) on the same request.

You cannot combine httpResponseBody with a request field that is exclusive of browser requests (e.g. httpResponseBody and browserHtml).

httpResponseHeaders can be requested alone or with any other valid combination of request fields except for serp.

The request body size limit is 5MiB.

Authorizations:

BasicAuth

Request Body schema: application/json

An extraction request body

url

required

string <= 8192 characters

An absolute URL to extract data from.

The host name must be a domain name, it cannot be an IP address.

object (RequestHeaders)

HTTP request headers.

Can only be used in a browser request. For HTTP requests, see customHttpRequestHeaders.

At the moment it only supports the Referer header.

See an example.

referer

string

Referer header.

object or null

Assign arbitrary key-value pairs to the request that you can use for filtering in the Stats API.

Keys must be strings. Values must be strings or null.

For example: {"tags": {"foo": "bar", "baz": null}}.

property name*

additional property

string

ipType

string

Enum: "datacenter" "residential"

Type of IP address from which the request should be sent.

If not specified, Zyte API will use an IP type that, for the target website, does not cause bans or unexpected response data.

If you believe Zyte API is using the wrong default IP type for a website, please reach out to our expert anti-ban team.

See an example.

httpRequestMethod

string

Enum: "GET" "POST" "PUT" "DELETE" "OPTIONS" "TRACE" "PATCH" "HEAD"

Request HTTP method.

Can only be used in combination with httpResponseBody.

See an example. See also: httpRequestText, httpRequestBody, customHttpRequestHeaders, httpResponseHeaders.

httpRequestBody

string <byte> <= 400000 characters

Base64-encoded data to send as request body.

Can only be used in combination with httpResponseBody.

It usually needs to be used in combination with httpRequestMethod.

If you only need to send UTF-8-encoded text, use httpRequestText instead to skip Base64-encoding. Note that you cannot combine both fields on the same request.

See an example. See also: customHttpRequestHeaders.

httpRequestText

string [ 1 .. 400000 ] characters

UTF-8 text to send as request body.

Can only be used in combination with httpResponseBody.

It usually needs to be used in combination with httpRequestMethod.

If you need to send a binary or non-UTF-8 request body, use httpRequestBody instead. Note that you cannot combine both fields on the same request.

See an example. See also: customHttpRequestHeaders.

Array of objects (CustomHttpRequestHeader) <= 200 items [ items ]

HTTP request headers.

Can only be used in combination with httpResponseBody. To set headers with other outputs, see requestHeaders.

Setting HTTP request headers has some caveats:

Zyte API sends some headers automatically for ban avoidance, and may silently override or drop some of your custom headers for that purpose.

However, your custom headers may override those automatic headers, and in doing so they can break the ban avoidance capabilities of Zyte API, as some websites may ban based on the presence, values, or order of certain headers.
You cannot set the Cookie header. Use requestCookies instead.
If you set multiple headers with the same name, only the last header value will be sent. To overcome this limitation, join the header values with a comma into a single header value. For example, replace "customHttpRequestHeaders": [{"name": "foo", "value": "bar"}, {"name": "foo", "value": "baz"}] with "customHttpRequestHeaders": [{"name": "foo", "value": "bar,baz"}].

See an example. See also: httpRequestMethod, httpRequestText, httpRequestBody, httpResponseHeaders.

Array (<= 200 items)

name	string <= 200 characters
value	string <= 2000 characters

httpResponseBody

boolean

Default: false

Set to true to get the HTTP response body in the httpResponseBody response field.

This field is not compatible with browser automation.

See an example. See also: httpRequestMethod, httpRequestText, httpRequestBody, customHttpRequestHeaders.

httpResponseHeaders

boolean

Default: false

Set to true to get the HTTP response headers in the httpResponseHeaders response field.

See an example. See also: customHttpRequestHeaders, requestHeaders.

browserHtml

boolean

Default: false

Set to true to get the browser HTML in the browserHtml response field.

This field is not compatible with HTTP requests.

If you use actions, the browser HTML is generated after action execution has finished or timed out.

By default, iframes are empty. See includeIframes.

To access content from the shadow DOM, check out the corresponding example in the actions documentation.

See an example. See also: screenshot, requestHeaders.

screenshot

boolean

Default: false

Set to true to get a page screenshot in the screenshot response field.

This field is not compatible with HTTP requests.

To adjust the screenshot contents you can use screenshotOptions and viewport.

If you use actions, the screenshot is generated after action execution has finished or timed out.

See an example. See also: browserHtml, requestHeaders.

object (ScreenshotOptions)

Options for the screenshot taken when the screenshot request field is true.

format

string

Default: "jpeg"

Enum: "png" "jpeg"

File format.

JPEG screenshots are taken with a quality of 75%.

fullPage

boolean

Default: false

When true, the screenshot features the full page. When false, it features only what is visible on the browser window (viewport).

Full page screenshots:

Are only available in JPEG format.
Have a minimum resolution of 1920x1080, i.e. for pages smaller than 1920x1080, the screenshot looks the same regardless of the value of fullPage.
Any image exceeding 5000 (width) x 10000 (height) pixels will be clipped to those dimensions.

article

boolean

Default: false

Set to true to get article data in the article response field.

The target page should only contain a single article, such as a blog post or a news article. For pages with multiple articles consider using articleList instead.

To combine this field with HTTP requests, set articleOptions.extractFrom to "httpResponseBody".

If you use actions, data extraction happens after action execution has finished or timed out.

object (ExtractionOptions)

Options for automatic extraction.

extractFrom

string

Enum: "httpResponseBody" "browserHtml"

Extraction source.

httpResponseBody extracts from httpResponseBody. It is usually faster and cheaper.

browserHtml extracts from both browserHtml and visual features of the rendered web page. It typically improves quality over httpResponseBody on JavaScript-heavy webpages.

If not specified, browserHtml is currently used by default for AI extraction, while httpResponseBody is used by default for serp. In the future, the default value may depend on the target website.

articleList

boolean

Default: false

Set to true to get article list data in the articleList response field.

The target page should contain multiple articles, usually as links or short snippets. Examples of such pages are main or category pages of news sites, main pages of blogs showing multiple posts, and other pages with multiple articles.

Article list data is especially useful to get basic information about articles on a website, like a headline and a link to the article details, using a smaller number of requests, when article attributes are extracted directly from a article list page, without making individual article requests.

To implement article crawling from article list pages, use articleNavigation, which also enables navigation through pagination links.

To combine this field with HTTP requests, set articleListOptions.extractFrom to "httpResponseBody".

If you use actions, data extraction happens after action execution has finished or timed out.

object (ExtractionOptions)

Options for automatic extraction.

extractFrom

string

Enum: "httpResponseBody" "browserHtml"

Extraction source.

httpResponseBody extracts from httpResponseBody. It is usually faster and cheaper.

browserHtml extracts from both browserHtml and visual features of the rendered web page. It typically improves quality over httpResponseBody on JavaScript-heavy webpages.

articleNavigation

boolean

Default: false

Set to true to get article navigation data in the articleNavigation response field.

The target page should contain multiple articles and/or subcategories that can be followed.

Article navigation data is especially useful for implementing article crawling, i.e. following links to article pages, as well as to subcategories and pagination that can in turn link to more article pages.

Article navigation data can also be used to get basic information of articles and subcategories on a website, obtaining the URLs and link names of the articles and subcategories, without making individual requests for those articles.

To combine this field with HTTP requests, set articleNavigationOptions.extractFrom to "httpResponseBody".

If you use actions, data extraction happens after action execution has finished or timed out.

object (ExtractionOptions)

Options for automatic extraction.

extractFrom

string

Enum: "httpResponseBody" "browserHtml"

Extraction source.

httpResponseBody extracts from httpResponseBody. It is usually faster and cheaper.

browserHtml extracts from both browserHtml and visual features of the rendered web page. It typically improves quality over httpResponseBody on JavaScript-heavy webpages.

forumThread

boolean

Default: false

Set to true to get forum threads data in the forumThread response field.

The target page should contain an individual forum thread page on a forum website.

To combine this field with HTTP requests, set forumThread.extractFrom to "httpResponseBody". If you use actions, data extraction happens after action execution has finished or timed out.

object (ExtractionOptions)

Options for automatic extraction.

extractFrom

string

Enum: "httpResponseBody" "browserHtml"

Extraction source.

httpResponseBody extracts from httpResponseBody. It is usually faster and cheaper.

browserHtml extracts from both browserHtml and visual features of the rendered web page. It typically improves quality over httpResponseBody on JavaScript-heavy webpages.

jobPosting

boolean

Default: false

Set to true to get job posting data in the jobPosting response field.

The target page should contain individual job posting page on a company website or on a job website.

To combine this field with HTTP requests, set jobPostingOptions.extractFrom to "httpResponseBody".

If you use actions, data extraction happens after action execution has finished or timed out.

object (ExtractionOptions)

Options for automatic extraction.

extractFrom

string

Enum: "httpResponseBody" "browserHtml"

Extraction source.

httpResponseBody extracts from httpResponseBody. It is usually faster and cheaper.

browserHtml extracts from both browserHtml and visual features of the rendered web page. It typically improves quality over httpResponseBody on JavaScript-heavy webpages.

jobPostingNavigation

boolean

Default: false

Set to true to get job posting navigation data in the jobPostingNavigation response field.

The target page should contain multiple job postings and/or subcategories that can be followed.

Job posting navigation data is especially useful for implementing job posting crawling, i.e. following links to job posting pages, as well as pagination that can in turn link to more job posting pages.

Job posting navigation data can also be used to get basic information of job postings on a website, obtaining the URLs and link names of the job postings, without making individual requests for them.

To combine this field with HTTP requests, set jobPostingNavigationOptions.extractFrom to "httpResponseBody".

If you use actions, data extraction happens after action execution has finished or timed out.

object (ExtractionOptions)

Options for automatic extraction.

extractFrom

string

Enum: "httpResponseBody" "browserHtml"

Extraction source.

httpResponseBody extracts from httpResponseBody. It is usually faster and cheaper.

browserHtml extracts from both browserHtml and visual features of the rendered web page. It typically improves quality over httpResponseBody on JavaScript-heavy webpages.

product

boolean

Default: false

Set to true to get product data in the product response field.

The target page should only contain a single product. For pages with multiple products consider using productList instead.

To combine this field with HTTP requests, set productOptions.extractFrom to "httpResponseBody".

If you use actions, data extraction happens after action execution has finished or timed out.

See an example. See also: List of all automatic extraction request fields, productNavigation, browserHtml, screenshot, requestHeaders.

object

Additional options for product extraction.

extractFrom

string

Enum: "httpResponseBody" "browserHtml"

Extraction source.

httpResponseBody extracts from httpResponseBody. It is usually faster and cheaper.

browserHtml extracts from both browserHtml and visual features of the rendered web page. It typically improves quality over httpResponseBody on JavaScript-heavy webpages.

model

string

Enum: "2024-02-01" "2024-09-16"

Model version to use for product extraction. If not specified, the "2024-09-16" version is used.

Available product models:

"2024-02-01"
"2024-09-16"

See Model pinning.

productList

boolean

Default: false

Set to true to get product list data in the productList response field.

The target page should contain a list or a grid of products.

Product list data is especially useful to get basic information about products on a website using a smaller number of requests, when product attributes are extracted directly from a product list page, without making individual product requests.

To implement product crawling from product list pages, use productNavigation, which also enables navigation through pagination links.

To combine this field with HTTP requests, set productListOptions.extractFrom to "httpResponseBody".

If you use actions, data extraction happens after action execution has finished or timed out.

object (ExtractionOptions)

Options for automatic extraction.

extractFrom

string

Enum: "httpResponseBody" "browserHtml"

Extraction source.

httpResponseBody extracts from httpResponseBody. It is usually faster and cheaper.

browserHtml extracts from both browserHtml and visual features of the rendered web page. It typically improves quality over httpResponseBody on JavaScript-heavy webpages.

productNavigation

boolean

Default: false

Set to true to get product navigation data in the productNavigation response field.

The target page should contain multiple products and/or subcategories that can be followed.

Product navigation data is especially useful for implementing product crawling, i.e. following links to product pages, as well as to subcategories and pagination that can in turn link to more product pages.

Product navigation data can also be used to get basic information of products and subcategories on a website, obtaining the URLs and link names of the products and subcategories, without making individual requests for those products.

To combine this field with HTTP requests, set productNavigationOptions.extractFrom to "httpResponseBody".

If you use actions, data extraction happens after action execution has finished or timed out.

object (ExtractionOptions)

Options for automatic extraction.

extractFrom

string

Enum: "httpResponseBody" "browserHtml"

Extraction source.

httpResponseBody extracts from httpResponseBody. It is usually faster and cheaper.

browserHtml extracts from both browserHtml and visual features of the rendered web page. It typically improves quality over httpResponseBody on JavaScript-heavy webpages.

object or null

Schema of the custom attributes to extract. This is a subset of the OpenAPI specification, using JSON syntax.

Zyte custom attributes extraction uses a Large Language Model (LLM) operated by Zyte to obtain any structured data specified by this schema from any unstructured web page. This allows to perform extraction similar to standard schemas, such as article or product, but much more flexibly.

When this field is specified, the customAttributes.values field in the response would contain the extracted data.

When custom attributes extraction is requested, a standard extraction field must also be specified (e.g. product). This determines the part of the web page which would be passed to the LLM for custom attributes extraction, e.g. when a web page is a product, we're only going to pass the product information, ignoring other parts of the page, such as menu or footer, which makes extraction cheaper and more accurate.

See detailed documentation. Additionally, to see a request example, scroll up to the right-hand sidebar Request samples, and select “Extract Custom Attributes along with Article information” under Example.

additional property

object (CustomAttribute)

description	string <= 300 characters
type required	string boolean

object

Additional options for custom attributes extraction.

method	string Default: "generate" Enum: "generate" "extract" Method to use for custom attributes extraction: "generate" (default) generates extracted data with the help of a generative Large Language Model (LLM). It is the most powerful and versatile extraction method, but also the most expensive one, with variable per-request cost. "extract" locates extracted data in the requested web page with the help of a non-generative LLM. It only supports a subset of the schema (only string, integer and number types), and can't perform generative tasks such as summarization or data transformation. It is however much cheaper compared to the generative method and has a fixed per-request cost.
maxInputTokens	integer >= 1 Limit on the number of input tokens for custom attribute extraction with the "generate" method. This includes the schema as well, but not our internal fixed prompt with the LLM instruction. When the number of tokens for schema and page text is above the specified maxInputTokens, we truncate the page text to fit in maxInputTokens. This may result in quality degradation or data not extracted from the page because it was truncated. Tokens are words or word pieces, for example `{"price": "2.00 $"}` is 9 tokens: `{"`, `price`, `":`, `"`, `2`, `.`, `00`, `$`, `"}`.
maxOutputTokens	integer >= 1 Limit on the number of output tokens for extracted custom attributes with the "generate" method. This field can be set to limit the extraction cost, but may result in quality degradation. See an example of token counting in the maxInputTokens field above.

geolocation

string (CountryCode)

Enum: "AW" "AF" "AO" "AI" "AX" "AL" "AD" "AE" "AR" "AM" "AS" "AQ" "TF" "AG" "AU" "AT" "AZ" "BI" "BE" "BJ" "BQ" "BF" "BD" "BG" "BH" "BS" "BA" "BL" "BY" "BZ" "BM" "BO" "BR" "BB" "BN" "BT" "BV" "BW" "CF" "CA" "CC" "CH" "CL" "CN" "CI" "CM" "CD" "CG" "CK" "CO" "KM" "CV" "CR" "CU" "CW" "CX" "KY" "CY" "CZ" "DE" "DJ" "DM" "DK" "DO" "DZ" "EC" "EG" "ER" "EH" "ES" "EE" "ET" "FI" "FJ" "FK" "FR" "FO" "FM" "GA" "GB" "GE" "GG" "GH" "GI" "GN" "GP" "GM" "GW" "GQ" "GR" "GD" "GL" "GT" "GF" "GU" "GY" "HK" "HM" "HN" "HR" "HT" "HU" "ID" "IM" "IN" "IO" "IE" "IR" "IQ" "IS" "IL" "IT" "JM" "JE" "JO" "JP" "KZ" "KE" "KG" "KH" "KI" "KN" "KR" "KW" "LA" "LB" "LR" "LY" "LC" "LI" "LK" "LS" "LT" "LU" "LV" "MO" "MF" "MA" "MC" "MD" "MG" "MV" "MX" "MH" "MK" "ML" "MT" "MM" "ME" "MN" "MP" "MZ" "MR" "MS" "MQ" "MU" "MW" "MY" "YT" "NA" "NC" "NE" "NF" "NG" "NI" "NU" "NL" "NO" "NP" "NR" "NZ" "OM" "PK" "PA" "PN" "PE" "PH" "PW" "PG" "PL" "PR" "KP" "PT" "PY" "PS" "PF" "QA" "RE" "RO" "RU" "RW" "SA" "SD" "SN" "SG" "GS" "SH" "SJ" "SB" "SL" "SV" "SM" "SO" "PM" "RS" "SS" "ST" "SR" "SK" "SI" "SE" "SZ" "SX" "SC" "SY" "TC" "TD" "TG" "TH" "TJ" "TK" "TM" "TL" "TO" "TT" "TN" "TR" "TV" "TW" "TZ" "UG" "UA" "UM" "UY" "US" "UZ" "VA" "VC" "VE" "VG" "VI" "VN" "VU" "WF" "WS" "YE" "ZA" "ZM" "ZW"

ISO 3166-1 alpha-2 code of a country from which the request should be sent, i.e. the request geolocation.

If not specified, Zyte API will use a geolocation that, for the target website, does not cause bans or unexpected locale changes in the response data, such as the wrong language, currency, date format, time zone, etc.

If you believe Zyte API is using the wrong default geolocation for a website, please reach out to our expert anti-ban team.

For some websites, however, you might want to set a custom geolocation. For example, you may be interested in visiting the same URL from different locations.

Zyte API provides 2 sets of geolocations. Standard geolocations are AU, BE, BR, CA, CN, DE, ES, FR, GB, IN, IT, JP, KR, MX, NL, PL, RU, TR, US, and ZA. All other geolocations are extended geolocations.

See an example.

javascript

boolean

Forces JavaScript execution on a browser request to be enabled (true) or disabled (false).

By default Zyte API enables or disables JavaScript execution for a request depending on which option makes it easier to avoid bans. Use this request field to override that choice.

Passing this request field when requesting automatic extraction ( product, article, etc.) may impact the quality of the returned data, as it might override the optimal value for automatic extraction.

This field is not compatible with HTTP requests.

See an example.

Array of click (object) or doubleClick (object) or evaluate (object) or goto (object) or hide (object) or hover (object) or interaction (object) or keyPress (object) or reload (object) or scrollBottom (object) or scrollTo (object) or searchKeyword (object) or select (object) or setLocation (object) or type (object) or waitForNavigation (object) or waitForRequest (object) or waitForResponse (object) or waitForSelector (object) or waitForTimeout (object) (ActionSequence) [ items ]

Sequence of browser actions to execute.

Select an action below to see its API reference.

When using actions, you get the actions response field with debug information about action execution.

See an example.

Array

One of

action

required

any

Value: "click"

Click on an element.

required

object (ActionSelector)

A CSS or XPath selector to search for an element.

type required	string Enum: "css" "xpath" The type of selector - CSS or XPath
value required	string [ 1 .. 500 ] characters
state	string Default: "visible" Enum: "attached" "visible" "hidden" State can be either of the following values and defaults to visible 'visible' - The element has a non-empty bounding box and no visibility:hidden. Note that an element without content or with display:none has an empty bounding box, and is not considered visible. 'hidden' - The element is either detached from the DOM, or has an empty bounding box or visibility:hidden. This is the opposite of the 'visible' option. 'attached' - The element is present in the DOM; it can be visible or hidden

button

string

Default: "left"

Enum: "left" "right" "middle"

Mouse button to click

delay

number [ 0 .. 3 ]

Default: 0

Time to wait between mousedown and mouseup, in seconds.

waitForNavigationTimeout

number [ 0 .. 20 ]

Default: 0

Maximum waiting time in seconds for the navigation event during the click action.

If navigation happens within the defined duration, then waiting is halted and the next action is executed after the new is page is loaded. If the page loading does not finish then the next action ends with an error, and following actions may not be executed, depending on the onError property. If no navigation happens within the defined duration then the next action is executed.

onError

string (onError)

Default: "return"

Enum: "continue" "return"

Handle errors encountered while executing a particular action.

continue - When a particular action fails, the action sequence continues, executing the next actions
return - When a particular actions fails, the action sequence stops, not executing any more actions

When an action sequence finishes prematurely the service will return the entire response body up until the point of execution.

jobId

string <= 100 characters

ID of the Scrapy Cloud job from which this request has been sent, to be returned in the jobId response field.

This field is meant to help with request tracking.

scrapy-zyte-api fills this request field automatically.

See an example. See also: echoData.

echoData

any

This field is returned in the echoData response field, verbatim.

This field can be useful, for example, to keep track of the original request order when sending multiple requests in parallel.

The request can be rejected if the data is too big.

See an example. See also: jobId.

object (Viewport)

Browser viewport.

width	integer [ 320 .. 5120 ] Default: 1920 Viewport width, in pixels.
height	integer [ 360 .. 4096 ] Default: 1080 Viewport height, in pixels.

followRedirect

boolean

Whether to follow HTTP redirection or not.

Only supported in HTTP requests, browser requests always follow redirection.

Array of objects (SessionContext) [ items <= 10 items ]

User-defined name-value pairs to request a server-managed session initialized with sessionContextParameters).

For every subsequent request with the same session context, Zyte API will either reuse an available session created for the same session context or create a new session using sessionContextParameters).

Server-managed sessions expire after 4 hours or 3 ban responses. If you are targeting websites that silently expire their sessions before the 4-hour mark, i.e. they revert the effects of your sessionContextParameters but requests continue working as expected otherwise, consider using client-managed sessions for higher session control.

See an example. See also: requestCookies, responseCookies.

Array

name required	string [ 1 .. 30 ] characters Name of the context identifier.
value required	string [ 1 .. 100 ] characters Value of the context identifier.

object (SessionContextParameters)

Parameters to create a server-managed session for a given sessionContext).

See an example. See also: actions.

Actions to run to initialize a server-managed session for a given sessionContext).

Array

One of

action

required

any

Value: "click"

Click on an element.

required

object (ActionSelector)

A CSS or XPath selector to search for an element.

type required	string Enum: "css" "xpath" The type of selector - CSS or XPath
value required	string [ 1 .. 500 ] characters
state	string Default: "visible" Enum: "attached" "visible" "hidden" State can be either of the following values and defaults to visible 'visible' - The element has a non-empty bounding box and no visibility:hidden. Note that an element without content or with display:none has an empty bounding box, and is not considered visible. 'hidden' - The element is either detached from the DOM, or has an empty bounding box or visibility:hidden. This is the opposite of the 'visible' option. 'attached' - The element is present in the DOM; it can be visible or hidden

button

string

Default: "left"

Enum: "left" "right" "middle"

Mouse button to click

delay

number [ 0 .. 3 ]

Default: 0

Time to wait between mousedown and mouseup, in seconds.

waitForNavigationTimeout

number [ 0 .. 20 ]

Default: 0

Maximum waiting time in seconds for the navigation event during the click action.

onError

string (onError)

Default: "return"

Enum: "continue" "return"

Handle errors encountered while executing a particular action.

continue - When a particular action fails, the action sequence continues, executing the next actions
return - When a particular actions fails, the action sequence stops, not executing any more actions

When an action sequence finishes prematurely the service will return the entire response body up until the point of execution.

object (Session)

Parameters to create or reuse a client-managed session.

If id does not match one of your running sessions, a new session is created with that session ID. Otherwise, the matching running session is reused.

Client-managed sessions may expire due to any of the following:

15 minutes (900 seconds) have passed since the session was created.
2 minutes (120 seconds) have passed since the session use.
For 3 times in a row, requests using this session got banned.

For 5-10 minutes after a session expires, Zyte API keeps track of the expired session and does not allow re-using it. After that time, attempts to reuse the session will instead create a new session.

See an example.

id	string User-defined session ID. It must be a version 4 UUID, i.e. a randomly-generated UUID.

Array of objects (NetworkCaptureFilterSequence) <= 10 items [ items ]

Filters to capture browser network responses.

HTTP responses received during browser rendering (including action execution) will be returned in the networkCapture response field if they match any of the filters defined here.

You can capture up to 10 responses, provided the sum of their bodies does not exceed 5 MiB. If they do exceed that limit, only the first captured responses within the limit are returned.

See an example.

Array (<= 10 items)

filterType required	string url
httpResponseBody	boolean Default: false Set to `true` to get the body of the captured response in the networkCapture[].httpResponseBody response field.
value required	string [ 3 .. 8192 ] characters A string to compare with the URL of network responses according to `matchType`.
matchType required	string (PatternMatchingOptions) Default: "contains" Enum: "startsWith" "endsWith" "contains" "exact" How to compare a user-defined string with a target string: `contains` matches if the user-defined string is a substring of the target string. `exact` matches if the user-defined string is an exact match of the target string. `startsWith` matches if the target string starts with the user-defined string. `endsWith` matches if the target string ends with the user-defined string. Comparisons are case-sensitive. Regular expressions or wildcard characters are not supported.

device

string

Enum: "desktop" "mobile"

Type of device to emulate during your request.

A desktop device is emulated by default.

Can only be used in combination with httpResponseBody.

cookieManagement

any

Default: "auto"

Enum: "auto" "discard"

Cookie management method

It determines how to handle user cookies, defined through requestCookies, and automatic cookies, cookies automatically generated by Zyte API. auto (default) uses user cookies if defined, or automatic cookies otherwise.

discard uses user cookies if defined, or no cookies otherwise.

Array of objects (Cookie) <= 100 items [ items ]

A list of cookies to be sent with a request.

You can use the contents of the responseCookies response field as a value for this request field.

See an example.

Array (<= 100 items)

name required	string <= 4085 characters Cookie name
value required	string <= 4085 characters Cookie value
domain required	string <= 253 characters Domain the cookie belongs to
path	string Path the cookie belongs to
expires	integer <int64> Unix time in seconds.
httpOnly	boolean
secure	boolean
sameSite	string Enum: "Strict" "Lax" "Extended" "None"

responseCookies

boolean

Default: false

Set to true to get the list of cookies set during a request in the responseCookies response field.

See an example. See also: requestCookies.

serp

boolean

Set to true to get the data of a search engine results page (SERP) in the serp response field.

The target URL should be a search URL that belongs to a Google domain.

Currently, you cannot combine this field with any other request fields besides serpOptions and url.

object (SerpOptions)

Options for SERP extraction.

extractFrom

string

Enum: "browserHtml" "httpResponseBody"

Input to use for extraction, either httpResponseBody or browserHtml.

If not specified, httpResponseBody is currently used by default. In the future, the default value may depend on the target website.

includeIframes

boolean

Default: false

Whether to add the content of iframes into browserHtml.

Note that iframes are visible in screenshots even if this is set to false.

Responses

Response Schema: application/json

url

required

string

URL the data was extracted from.

Could be different from the input URL in case of redirection.

See also: url.

httpResponseBody

string <byte>

Base64-encoded HTTP response body.

To get this response field, set the httpResponseBody request field to true.

Unlike browserHtml, this field supports binary response bodies, such as image files or PDF files. This is the reason why this field is Base64-encoded, JSON does not support binary data.

See an example.

Array of objects (HTTPHeader) [ items ]

HTTP response headers.

To get this response field, set the httpResponseHeaders request field to true.

The Content-Encoding header value (e.g. gzip, br, etc.) should not be used to decompress httpResponseBody, Zyte API already decompresses the body of compressed responses.

The Set-Cookie header value, when present, contains the header value received from the main HTTP response. These cookies could have changed later on, e.g. during browser rendering. Usually you will want to ignore this header in favor of responseCookies, which provides the final cookies.

See an example.

Array

name required	string non-empty The name of the header
value required	string The value of the header

browserHtml

string

Browser HTML.

To get this response field, set the browserHtml request field to true.

Browser HTML does not include the contents of iframes or the shadow DOM.

See an example.

object (Session)

Parameters to create or reuse a client-managed session.

If id does not match one of your running sessions, a new session is created with that session ID. Otherwise, the matching running session is reused.

Client-managed sessions may expire due to any of the following:

15 minutes (900 seconds) have passed since the session was created.
2 minutes (120 seconds) have passed since the session use.
For 3 times in a row, requests using this session got banned.

For 5-10 minutes after a session expires, Zyte API keeps track of the expired session and does not allow re-using it. After that time, attempts to reuse the session will instead create a new session.

See an example.

id	string User-defined session ID. It must be a version 4 UUID, i.e. a randomly-generated UUID.

screenshot

string <byte>

Base64-encoded page screenshot file data.

To get this response field, set the screenshot request field to true.

screenshotOptions.format determines the file format of the screenshot data.

See an example.

object

Article data.

To get this response field, set the article request field to true.

headline

string

Article headline or title.

articleBody

string

Clean text of the article, including sub-headings, with newline separators.

articleBodyHtml

string

Simplified and standardized HTML of the article body, including sub-headings, image captions and embedded content (videos, tweets, etc.).

description

string

A short summary of the article. It can be either human-provided (if available), or auto-generated.

datePublished

string

Publication date. ISO-formatted with 'T' separator, may contain a timezone. If the actual publication date is not found, "dateModified" value is taken.

datePublishedRaw

string

Same date as "datePublished", but before parsing/normalization, i.e. as it appears on the website.

dateModified

string

The date when the article was most recently modified. ISO-formatted with 'T' separator, may contain a timezone.

dateModifiedRaw

string

Same date as "dateModified", but before parsing/normalization, i.e. as it appears on the website.

Array of objects (Author) [ items ]

Authors of the article.

Array

name required	string Full name of the author, e.g. "Alice".
nameRaw	string Text from which this author name was extracted, e.g. "Alice and Bob".

inLanguage

string

Language of the article, as an ISO 639-1 language code. Example: "en". Sometimes article language is not the same as the web page overall language; to get the detected web page languages, see "webPageInfo".

Array of objects (Breadcrumb) [ items ]

A list of breadcrumbs (a specific navigation element) with optional name and url.

Array

name	string Text of the breadcrumb, as it appears on the website.
url	string Absolute URL of the breadcrumb.

object (Image)

Image.

url

required

string

URL of an image.

Array of objects (Image) [ items ]

All images of the item (may include the main image).

Array

url

required

string

URL of an image.

Array of objects[ items ]

A list of all videos inside the article body.

Array

url

required

string

Absolute URL of the video.

Array of objects[ items ]

A list of all audios inside the article body.

Array

url

required

string

Absolute URL of the audio.

url

required

string

URL of a page where this article was extracted.

canonicalUrl

string

Canonical URL of the article, if available.

required

object (schemas-Metadata)

Extracted item metadata for single-item data types.

probability

required

number [ 0 .. 1 ]

Probability that extracted item is of requested data type. It is closer to 0 in case this page does not contain requested data type. For example, when single product extraction is requested with "product: true", but a page does not contain a product, probability would be close to 0. If an item of requested type can be extracted from a page, then probability is closer to 1. Recommended probability threshold is 0.5, but we will return extracted data even if probability is very low.

dateDownloaded

required

string (DateDownloaded)

The timestamp at which the data was downloaded. Timezone: UTC. Format: ISO 8601 format: "YYYY-MM-DDThh:mm:ssZ"

object

Article list data.

To get this response field, set the articleList request field to true.

Array of objects[ items ]

List of articles available on this page.

Array

url

string

URL of a detailed article page. Pass this URL with "article: true" in the request to extract detailed information about the article.

headline

string

Article headline or title.

articleBody

string

Text of the article as it appears on the list page, including sub-headings, with newline separators.

datePublished

string

Publication date. ISO-formatted with 'T' separator, may contain a timezone.

datePublishedRaw

string

Same date as "datePublished", but before parsing/normalization, i.e. as it appears on the website.

Array of objects (Author) [ items ]

Authors of the article.

Array

name required	string Full name of the author, e.g. "Alice".
nameRaw	string Text from which this author name was extracted, e.g. "Alice and Bob".

inLanguage

string

object (Image)

Image.

url

required

string

URL of an image.

Array of objects (Image) [ items ]

All images of the item (may include the main image).

Array

url

required

string

URL of an image.

required

object (MetadataListItem)

Item-level metadata for list data types.

probability

required

number [ 0 .. 1 ]

Probability that extracted item in a list is a valid item. Items which are unlikely to be valid are not returned, so normally no extra thresholding is needed for list items. This probability is not calibrated.

url

required

string

URL of a page where this article list was extracted.

required

object (MetadataList)

Top-level metadata for list data types.

dateDownloaded

required

string (DateDownloaded)

The timestamp at which the data was downloaded. Timezone: UTC. Format: ISO 8601 format: "YYYY-MM-DDThh:mm:ssZ"

object

Article navigation data.

To get this response field, set the articleNavigation request field to true.

object (PaginationNext)

A link to the next page in the list.

url required	string URL of the next page in the list.
name	string Text of the link to the next page, if available.

pageNumber

integer (PageNumber)

Integer describing the current page number. Starts at 1.

Array of objects[ items ]

List of articles available on this page.

Array

url

required

string

URL of a detailed article page. Pass this URL with "article: true" in the request to extract detailed information about the article.

name

string

The name of the article or article link text.

datePublished

string

Publication date. ISO-formatted with 'T' separator, may contain a timezone.

datePublishedRaw

string

Same date as "datePublished", but before parsing/normalization, i.e. as it appears on the website.

required

object (MetadataListItem)

Item-level metadata for list data types.

probability

required

number [ 0 .. 1 ]

url

required

string

URL of a page containing the list of articles.

required

object (MetadataList)

Top-level metadata for list data types.

dateDownloaded

required

string (DateDownloaded)

The timestamp at which the data was downloaded. Timezone: UTC. Format: ISO 8601 format: "YYYY-MM-DDThh:mm:ssZ"

object

Forum thread data.

To get this response field, set the forumThread request field to true.

object

Topic that is discussed on the page.

name

required

string

Name of the topic.

Array of objects[ items ]

List of posts available on this page, including the first or top post.

Array

text

string

Text of the post.

datePublished

string

Publication date. ISO-formatted with 'T' separator, may contain a timezone.

datePublishedRaw

string

Same date as "datePublished", but before parsing/normalization, i.e. as it appears on the website.

object

Details of reactions to this post.

likes	integer >= 0 Number of up-votes or likes/stars received by the post.
replies	integer >= 0 Number of replies received by the post.

required

object (MetadataListItem)

Item-level metadata for list data types.

probability

required

number [ 0 .. 1 ]

url

required

string

URL of a page where this forum post list was extracted.

required

object (MetadataList)

Top-level metadata for list data types.

dateDownloaded

required

string (DateDownloaded)

The timestamp at which the data was downloaded. Timezone: UTC. Format: ISO 8601 format: "YYYY-MM-DDThh:mm:ssZ"

object

Job posting data.

To get this response field, set the jobPosting request field to true.

jobTitle

string

The title of the job.

datePublished

string

Publication date of the job posting. ISO-formatted with 'T' separator, may contain a timezone.

datePublishedRaw

string

Same date as 'datePublished', but before parsing/normalization, i.e. as it appears on the website.

validThrough

string

The date after which the job posting is not valid, e.g. the end of an offer. ISO-formatted with ‘T’ separator, may contain a timezone.

description

string

A description of the job posting including sub-headings, with newline separators.

descriptionHtml

string

Simplified HTML of the description, including sub-headings, image captions and embedded content.

employmentType

string

Type of employment (e.g. full-time, part-time, contract, temporary, seasonal, internship).

object

Information about the organization offering the job position.

name

required

string

Name of the organization.

object

The base salary of the job or of an employee in the proposed role.

raw	string Salary amount as it appears on the website.
valueMax	string The maximum value of the base salary as a number string. In case of only one value given for the salary instead of a range, valueMax is used to represent it.
currency	string Currency associated with the salary amount. ISO 4217 standard.
currencyRaw	string Currency associated with the salary amount, without normalization.

object

A (typically single) geographic location associated with the job position.

raw

required

string

Job location as it appears on the website.

url

required

string

URL of a page where this job posting was extracted.

required

object (schemas-Metadata)

Extracted item metadata for single-item data types.

probability

required

number [ 0 .. 1 ]

dateDownloaded

required

string (DateDownloaded)

The timestamp at which the data was downloaded. Timezone: UTC. Format: ISO 8601 format: "YYYY-MM-DDThh:mm:ssZ"

object

Job posting navigation data.

To get this response field, set the jobPostingNavigation request field to true.

object (PaginationNext)

A link to the next page in the list.

url required	string URL of the next page in the list.
name	string Text of the link to the next page, if available.

pageNumber

integer (PageNumber)

Integer describing the current page number. Starts at 1.

Array of objects[ items ]

List of job postings available on this page.

Array

url

required

string

URL of a detailed job posting page. Pass this URL with "jobPosting: true" in the request to extract detailed information about the job posting.

name

string

The name of the job posting or job posting link text.

required

object (MetadataListItem)

Item-level metadata for list data types.

probability

required

number [ 0 .. 1 ]

url

required

string

URL a of page.

required

object (MetadataList)

Top-level metadata for list data types.

dateDownloaded

required

string (DateDownloaded)

The timestamp at which the data was downloaded. Timezone: UTC. Format: ISO 8601 format: "YYYY-MM-DDThh:mm:ssZ"

object

Product data.

To get this response field, set the product request field to true.

name

string (Name)

The name of the product.

price

string (Price) ^[0-9]+(\.[0-9]+)?$

The price at which the product is being offered. If there is only one price associated with the offer, it is returned in this field.

currency

string (Currency) ^[A-Z]{3}$

The ISO 4217 standard of the currency in which the price is in.

currencyRaw

string (CurrencyRaw)

The currency as given on the website, without extra normalization (for example, both "$" and "USD" are possible currencies).

regularPrice

string (RegularPrice) ^[0-9]+(\.[0-9]+)?$

The price before any discount or special offer.

availability

string (Availability)

Enum: "InStock" "OutOfStock"

Availability, as a string. Allowed values:

"InStock" - includes limited availability, presale, preorder, and in-store only.
"OutOfStock" - includes discontinued and sold out.

sku

string (Sku)

The Stock Keeping Unit (SKU), i.e. a merchant-specific identifier for the product - identifier assigned by the seller.

mpn

string (Mpn)

The Manufacturer Part Number (MPN) of the product. It is issued by the manufacturer, and is the same across different e-commerce websites.

Array of objects (Gtin) [ items ]

Standardized GTIN product identifier which is unique for a product across different sellers.

Array

type required	string Enum: "gtin8" "gtin13" "gtin14" "isbn10" "isbn13" "ismn" "issn" "upc" `gtin14` corresponds to former names EAN/UCC-14, SCC-14, DUN-14, UPC Case Code, UPC Shipping Container Code. `gtin13` also includes the jan (japanese article number).
value required	string The GTIN value as a string.

object

Brand or manufacturer of the product.

name

required

string

Name of the brand.

Array of objects (Breadcrumb) [ items ]

A list of breadcrumbs (a specific navigation element) with optional name and url.

Array

name	string Text of the breadcrumb, as it appears on the website.
url	string Absolute URL of the breadcrumb.

object (Image)

Image.

url

required

string

URL of an image.

Array of objects (Image) [ items ]

All images of the item (may include the main image).

Array

url

required

string

URL of an image.

description

string

Description of the product.

descriptionHtml

string

Simplified HTML of the description, including sub-headings, image captions and embedded content.

object

The overall rating, based on a collection of reviews or ratings.

ratingValue	number The average rating value.
bestRating	number The highest value allowed in this rating system.
reviewCount	integer >= 0 The total number of reviews or ratings for the product.

color

string (Color)

Color of the product.

size

string (Size)

A standardized size of a product, specified through a simple textual string (for example "XL", "32Wx34L"). A single product dimension (height, width) is not considered as the size.

object (Weight)

value	number A weight value expressed as a floating point number.
unit	string A normalized unit of weight, like kilogram / ounce / pound and others.
rawUnit	string A unit of weight without normalization - how it was extracted from the page. Normalized version of the rawUnit is in 'unit' attribute.

material

string

The materials from which the product is made. Contains all product materials on the page.

style

string (Style)

Style of the product. It can also be referred as pattern/finish on the product page. Example values: "Polka dots", "Striped", "Nickel finish with Translucent glass", etc.

Array of objects (AdditionalProperty) [ items ]

A list of properties or characteristics.

name field contains the property name,
value field contains the property value.

Array

name required	string Property name.
value	string Property value.

features

Array of strings

A list of features of the Product.

The features of a Product can be found generally on the product page arranged in a list, which is usually bulleted.

url

required

string (Url)

URL of a page where this product was extracted.

canonicalUrl

string (CanonicalUrl)

Canonical URL of the product, if available.

required

object (schemas-Metadata)

Extracted item metadata for single-item data types.

probability

required

number [ 0 .. 1 ]

dateDownloaded

required

string (DateDownloaded)

The timestamp at which the data was downloaded. Timezone: UTC. Format: ISO 8601 format: "YYYY-MM-DDThh:mm:ssZ"

Array of objects[ items ]

Array of product variants, using the same Product schema. Represents extra information available about the variants of a product. All variants are included into this array, including the variant shown on the page. If some field in this array is empty, it means that either the value is the same as in the top-level product, or that extraction API did not manage to extract it.

Array

name

string (Name)

The name of the product.

price

string (Price) ^[0-9]+(\.[0-9]+)?$

The price at which the product is being offered. If there is only one price associated with the offer, it is returned in this field.

currency

string (Currency) ^[A-Z]{3}$

The ISO 4217 standard of the currency in which the price is in.

currencyRaw

string (CurrencyRaw)

The currency as given on the website, without extra normalization (for example, both "$" and "USD" are possible currencies).

regularPrice

string (RegularPrice) ^[0-9]+(\.[0-9]+)?$

The price before any discount or special offer.

availability

string (Availability)

Enum: "InStock" "OutOfStock"

Availability, as a string. Allowed values:

"InStock" - includes limited availability, presale, preorder, and in-store only.
"OutOfStock" - includes discontinued and sold out.

sku

string (Sku)

The Stock Keeping Unit (SKU), i.e. a merchant-specific identifier for the product - identifier assigned by the seller.

mpn

string (Mpn)

The Manufacturer Part Number (MPN) of the product. It is issued by the manufacturer, and is the same across different e-commerce websites.

Array of objects (Gtin) [ items ]

Standardized GTIN product identifier which is unique for a product across different sellers.

Array

type required	string Enum: "gtin8" "gtin13" "gtin14" "isbn10" "isbn13" "ismn" "issn" "upc" `gtin14` corresponds to former names EAN/UCC-14, SCC-14, DUN-14, UPC Case Code, UPC Shipping Container Code. `gtin13` also includes the jan (japanese article number).
value required	string The GTIN value as a string.

object (Image)

Image.

url

required

string

URL of an image.

Array of objects (Image) [ items ]

All images of the item (may include the main image).

Array

url

required

string

URL of an image.

color

string (Color)

Color of the product.

size

string (Size)

A standardized size of a product, specified through a simple textual string (for example "XL", "32Wx34L"). A single product dimension (height, width) is not considered as the size.

style

string (Style)

Style of the product. It can also be referred as pattern/finish on the product page. Example values: "Polka dots", "Striped", "Nickel finish with Translucent glass", etc.

Array of objects (AdditionalProperty) [ items ]

A list of properties or characteristics.

name field contains the property name,
value field contains the property value.

Array

name required	string Property name.
value	string Property value.

url

string (Url)

URL of a page where this product was extracted.

canonicalUrl

string (CanonicalUrl)

Canonical URL of the product, if available.

object

Product list data.

To get this response field, set the productList request field to true.

Array of objects (Breadcrumb) [ items ]

A list of breadcrumbs (a specific navigation element) with optional name and url.

Array

name	string Text of the breadcrumb, as it appears on the website.
url	string Absolute URL of the breadcrumb.

Array of objects[ items ]

List of products available on this page.

Array

url

string

URL of a detailed product page. Pass this URL with "product: true" in the request to extract detailed information about the product.

name

string

The name of the product.

price

string (Price) ^[0-9]+(\.[0-9]+)?$

The price at which the product is being offered. If there is only one price associated with the offer, it is returned in this field.

currencyRaw

string (CurrencyRaw)

The currency as given on the website, without extra normalization (for example, both "$" and "USD" are possible currencies).

currency

string (Currency) ^[A-Z]{3}$

The ISO 4217 standard of the currency in which the price is in.

regularPrice

string (RegularPrice) ^[0-9]+(\.[0-9]+)?$

The price before any discount or special offer.

object (Image)

Image.

url

required

string

URL of an image.

required

object (MetadataListItem)

Item-level metadata for list data types.

probability

required

number [ 0 .. 1 ]

url

required

string

URL of a page where this product list was extracted.

required

object (MetadataList)

Top-level metadata for list data types.

dateDownloaded

required

string (DateDownloaded)

The timestamp at which the data was downloaded. Timezone: UTC. Format: ISO 8601 format: "YYYY-MM-DDThh:mm:ssZ"

categoryName

string

Name of the category in which the listed products are.

object

Product navigation data.

To get this response field, set the productNavigation request field to true.

categoryName

string

Name of the category in which the listed products are found.

object (PaginationNext)

A link to the next page in the list.

url required	string URL of the next page in the list.
name	string Text of the link to the next page, if available.

pageNumber

integer (PageNumber)

Integer describing the current page number. Starts at 1.

Array of objects[ items ]

List of products available on this page.

Array

url

required

string

URL of a detailed product page. Pass this URL with "product: true" in the request to extract detailed information about the product.

name

string

The name of the product or product link text.

required

object (MetadataListItem)

Item-level metadata for list data types.

probability

required

number [ 0 .. 1 ]

Array of objects[ items ]

List of subcategory links found on this page.

Array

url

required

string

URL of the subcategory.

name

string

The name of the subcategory or subcategory link text.

required

object (MetadataListItem)

Item-level metadata for list data types.

probability

required

number [ 0 .. 1 ]

url

required

string

URL a of page.

required

object (MetadataList)

Top-level metadata for list data types.

dateDownloaded

required

string (DateDownloaded)

The timestamp at which the data was downloaded. Timezone: UTC. Format: ISO 8601 format: "YYYY-MM-DDThh:mm:ssZ"

object

Values of extracted custom attributes, extracted according to the requested customAttributes schema.

property name*

additional property

any

object

inputTokens	integer Total number of used input tokens, excluding our internal fixed prompt with the LLM instruction, when using the "generate" method.
outputTokens	integer Total number of used output tokens, when using the "generate" method.
textInputTokens	integer Total number of input tokens used for the text of the web page, excluding the schema and our internal fixed prompt with the LLM instruction, when using the "generate" method. Already included in the customAttributes.metadata.inputTokens field.
textInputTokensBeforeTruncation	integer textInputTokens before the text was truncated to fit into the input limits, either set via customAttributesOptions.maxInputTokens or due to the model limitation returned in customAttributes.metadata.maxInputTokens, when using the "generate" method.
maxInputTokens	integer Maximum number of allowed input tokens for the model, when using the "generate" method.
excludedPIIAttributes	Array of strings A list of all attributes dropped from the output due to a risk of PII (Personally Identifiable Information) extraction.
error	string The `extraction/unparsable-response` error is given when the LLM response could not be parsed or recovered. If this error happens, we suggest simplifying the task or reducing the number of attributes. The `extraction/schema-size-exceeded` error is given when the schema did not fit into the input limits, leaving no space for the input text, and therefore the LLM could not be used. If this error happens, we suggest either making the schema smaller (fewer attributes and/or shorter descriptions), or increasing customAttributesOptions.maxInputTokens.

echoData

object

Arbitrary data set on the echoData request field.

See an example.

jobId

string <= 100 characters

Scrapy Cloud job ID set on the jobId request field.

See an example.

Array of objects (ActionResult) [ items ]

Debug information about the execution of the action sequence set in the actions request field.

Action order in the response always matches that of the request.

Array

action

required

string

The type of action submitted

elapsedTime

required

number

Elapsed time in seconds

status

required

string

Enum: "success" "continued" "returned" "notExecuted"

Status of execution of a particular action

success - When the action finishes execution successfully without any errors
continued - When the action fails, but the execution of the action sequence is continued
returned - When the action fails and stops execution
notExecuted - When a a prior action has failed, thereby not executing the current action

error

string

Detailed information about the underlying error.

Array of objects (InteractionLogEntry) [ items ]

Messages logged with console.log() from browser scripts.

Array

time	string The ISO 8601 format of the time
level	string Enum: "debug" "info" "warning" "error" "warn" The log level
message	string The log message

Array of objects (Cookie) [ items ]

List of cookies set during the request.

To get this response field, set the responseCookies request field to true.

See an example. See also: requestCookies.

Array

name required	string <= 4085 characters Cookie name
value required	string <= 4085 characters Cookie value
domain required	string <= 253 characters Domain the cookie belongs to
path	string Path the cookie belongs to
expires	integer <int64> Unix time in seconds.
httpOnly	boolean
secure	boolean
sameSite	string Enum: "Strict" "Lax" "Extended" "None"

Array of objects (CapturedResponse) [ items ]

Responses captured by filters specified in the networkCapture request parameter.

Array

object

Exit status of the network capture.

If interceptionStatus.status is error, httpResponseBody is not delivered.

Possible causes of error include all matching responses exceeding the maximum total body size of 5 MiB.

status	string Enum: "success" "error"
error	string Error message. This field is only present if `interceptionStatus.status` is `error`.

statusCode

integer

HTTP status code of the captured response.

httpResponseBody

string <byte>

Base64-encoded body of the captured response.

To get this response field, set the networkCapture[].httpResponseBody request field to true.

url

string <uri>

Captured response URL.

headers

object

Captured response headers.

object (NetworkCaptureFilter)

Filter defined in the networkCapture request field that matched the captured response.

filterType required	string url
httpResponseBody	boolean Default: false Set to `true` to get the body of the captured response in the networkCapture[].httpResponseBody response field.
value required	string [ 3 .. 8192 ] characters A string to compare with the URL of network responses according to `matchType`.
matchType required	string (PatternMatchingOptions) Default: "contains" Enum: "startsWith" "endsWith" "contains" "exact" How to compare a user-defined string with a target string: `contains` matches if the user-defined string is a substring of the target string. `exact` matches if the user-defined string is an exact match of the target string. `startsWith` matches if the target string starts with the user-defined string. `endsWith` matches if the target string ends with the user-defined string. Comparisons are case-sensitive. Regular expressions or wildcard characters are not supported.

object

Captured request that got the captured response.

url	string URL of the captured request.
headers	object Headers of the captured request.
method	string HTTP method of the captured request.
body	string Body of the captured request, if any.

object (SearchResultsPage)

Search engine results page data.

To get this response field, set the serp request field to true.

Array of objects (OrganicResult) [ items ]

List of search results excluding paid results.

Array

description	string Result excerpt.
name	string Result title.
url	string (OrganicResultURL) ^https?://[\S]+$ Result URL.
rank	integer Result position among organic results in the search page. The first result of a search page is always 1, regardless of the value of serp.pageNumber.

url

string (SearchURL) ^https?://[\S]+$

Search URL.

Should match url.

pageNumber

integer >= 1

Page number.

object (Metadata)

Metadata.

displayedQuery	string Search query as seen in the webpage.
searchedQuery	string Search query as specified in the input URL.
totalOrganicResults	integer <int64> >= 0 Total number of organic results reported by the search engine.
dateDownloaded	string The timestamp at which the data was downloaded. Timezone: UTC. Format: ISO 8601 format: "YYYY-MM-DDThh:mm:ssZ"

object (Product)

name

string (Name)

The name of the product.

price

string (Price) ^[0-9]+(\.[0-9]+)?$

The price at which the product is being offered. If there is only one price associated with the offer, it is returned in this field.

currency

string (Currency) ^[A-Z]{3}$

The ISO 4217 standard of the currency in which the price is in.

currencyRaw

string (CurrencyRaw)

The currency as given on the website, without extra normalization (for example, both "$" and "USD" are possible currencies).

regularPrice

string (RegularPrice) ^[0-9]+(\.[0-9]+)?$

The price before any discount or special offer.

availability

string (Availability)

Enum: "InStock" "OutOfStock"

Availability, as a string. Allowed values:

"InStock" - includes limited availability, presale, preorder, and in-store only.
"OutOfStock" - includes discontinued and sold out.

sku

string (Sku)

The Stock Keeping Unit (SKU), i.e. a merchant-specific identifier for the product - identifier assigned by the seller.

mpn

string (Mpn)

The Manufacturer Part Number (MPN) of the product. It is issued by the manufacturer, and is the same across different e-commerce websites.

Array of objects (Gtin) [ items ]

Standardized GTIN product identifier which is unique for a product across different sellers.

Array

type required	string Enum: "gtin8" "gtin13" "gtin14" "isbn10" "isbn13" "ismn" "issn" "upc" `gtin14` corresponds to former names EAN/UCC-14, SCC-14, DUN-14, UPC Case Code, UPC Shipping Container Code. `gtin13` also includes the jan (japanese article number).
value required	string The GTIN value as a string.

object

Brand or manufacturer of the product.

name

required

string

Name of the brand.

Array of objects (Breadcrumb) [ items ]

A list of breadcrumbs (a specific navigation element) with optional name and url.

Array

name	string Text of the breadcrumb, as it appears on the website.
url	string Absolute URL of the breadcrumb.

object (Image)

Image.

url

required

string

URL of an image.

Array of objects (Image) [ items ]

All images of the item (may include the main image).

Array

url

required

string

URL of an image.

description

string

Description of the product.

descriptionHtml

string

Simplified HTML of the description, including sub-headings, image captions and embedded content.

object

The overall rating, based on a collection of reviews or ratings.

ratingValue	number The average rating value.
bestRating	number The highest value allowed in this rating system.
reviewCount	integer >= 0 The total number of reviews or ratings for the product.

color

string (Color)

Color of the product.

size

string (Size)

A standardized size of a product, specified through a simple textual string (for example "XL", "32Wx34L"). A single product dimension (height, width) is not considered as the size.

object (Weight)

value	number A weight value expressed as a floating point number.
unit	string A normalized unit of weight, like kilogram / ounce / pound and others.
rawUnit	string A unit of weight without normalization - how it was extracted from the page. Normalized version of the rawUnit is in 'unit' attribute.

material

string

The materials from which the product is made. Contains all product materials on the page.

style

string (Style)

Style of the product. It can also be referred as pattern/finish on the product page. Example values: "Polka dots", "Striped", "Nickel finish with Translucent glass", etc.

Array of objects (AdditionalProperty) [ items ]

A list of properties or characteristics.

name field contains the property name,
value field contains the property value.

Array

name required	string Property name.
value	string Property value.

features

Array of strings

A list of features of the Product.

The features of a Product can be found generally on the product page arranged in a list, which is usually bulleted.

url

required

string (Url)

URL of a page where this product was extracted.

canonicalUrl

string (CanonicalUrl)

Canonical URL of the product, if available.

required

object (schemas-Metadata)

Extracted item metadata for single-item data types.

probability

required

number [ 0 .. 1 ]

dateDownloaded

required

string (DateDownloaded)

The timestamp at which the data was downloaded. Timezone: UTC. Format: ISO 8601 format: "YYYY-MM-DDThh:mm:ssZ"

Array of objects[ items ]

Array

name

string (Name)

The name of the product.

price

string (Price) ^[0-9]+(\.[0-9]+)?$

The price at which the product is being offered. If there is only one price associated with the offer, it is returned in this field.

currency

string (Currency) ^[A-Z]{3}$

The ISO 4217 standard of the currency in which the price is in.

currencyRaw

string (CurrencyRaw)

The currency as given on the website, without extra normalization (for example, both "$" and "USD" are possible currencies).

regularPrice

string (RegularPrice) ^[0-9]+(\.[0-9]+)?$

The price before any discount or special offer.

availability

string (Availability)

Enum: "InStock" "OutOfStock"

Availability, as a string. Allowed values:

"InStock" - includes limited availability, presale, preorder, and in-store only.
"OutOfStock" - includes discontinued and sold out.

sku

string (Sku)

The Stock Keeping Unit (SKU), i.e. a merchant-specific identifier for the product - identifier assigned by the seller.

mpn

string (Mpn)

The Manufacturer Part Number (MPN) of the product. It is issued by the manufacturer, and is the same across different e-commerce websites.

Array of objects (Gtin) [ items ]

Standardized GTIN product identifier which is unique for a product across different sellers.

Array

type required	string Enum: "gtin8" "gtin13" "gtin14" "isbn10" "isbn13" "ismn" "issn" "upc" `gtin14` corresponds to former names EAN/UCC-14, SCC-14, DUN-14, UPC Case Code, UPC Shipping Container Code. `gtin13` also includes the jan (japanese article number).
value required	string The GTIN value as a string.

object (Image)

Image.

url

required

string

URL of an image.

Array of objects (Image) [ items ]

All images of the item (may include the main image).

Array

url

required

string

URL of an image.

color

string (Color)

Color of the product.

size

string (Size)

A standardized size of a product, specified through a simple textual string (for example "XL", "32Wx34L"). A single product dimension (height, width) is not considered as the size.

style

string (Style)

Style of the product. It can also be referred as pattern/finish on the product page. Example values: "Polka dots", "Striped", "Nickel finish with Translucent glass", etc.

Array of objects (AdditionalProperty) [ items ]

A list of properties or characteristics.

name field contains the property name,
value field contains the property value.

Array

name required	string Property name.
value	string Property value.

url

string (Url)

URL of a page where this product was extracted.

canonicalUrl

string (CanonicalUrl)

Canonical URL of the product, if available.

Request samples

Content type

application/json

Example

Retrieve raw HTTP content from a page

{"url": "https://example.com",
"httpResponseBody": true
}

Response samples

Content type

application/json

{"url": "https://example.com/item-page/",
"statusCode": 200,
"httpResponseBody": "string",
"httpResponseHeaders": [{"name": "Content-Type",
"value": "text/html; charset=utf-8"
}
],
"browserHtml": "<html>Downloaded data.</html>",
"session": {"id": "ab837d21-f848-42b2-8e88-47ea9d84bad0"
},
"screenshot": "string",
"article": {"headline": "Article headline",
"articleBody": "Article body ...",
"articleBodyHtml": "<article><p>Article body ... </p> ... </article>",
"description": "Article summary",
"datePublished": "2019-06-19T00:00:00",
"datePublishedRaw": "June 19, 2019",
"dateModified": "2019-06-21T00:00:00",
"dateModifiedRaw": "June 21, 2019",
"authors": [{"name": "Alice",
"nameRaw": "Alice and Bob"
},
{"name": "Bob",
"nameRaw": "Alice and Bob"
}
],
"inLanguage": "en",
"breadcrumbs": [{"name": "Home",
"url": "https://example.com/"
},
{"name": "Cell Phones",
"url": "https://example.com/cell-phones"
},
{"name": "Cell Phones & Accessories"
}
],
"mainImage": {"url": "http://example.com/item-1/image1.jpeg"
},
"images": [{"url": "http://example.com/item-1/image1.jpeg"
}
],
"videos": [{"url": "https://example.com/video.mp4"
}
],
"audios": [{"url": "https://example.com/audio.mp3"
}
],
"url": "https://example.com/article/",
"canonicalUrl": "https://example.com/article",
"metadata": {"probability": 0.87,
"dateDownloaded": "2019-06-19T08:27:43Z"
}
},
"articleList": {"articles": [{"url": "https://example.com/articles/1/",
"headline": "Article headline",
"articleBody": "Article body ...",
"datePublished": "2019-06-19T00:00:00",
"datePublishedRaw": "June 19, 2019",
"authors": [{"name": "Alice",
"nameRaw": "Alice and Bob"
},
{"name": "Bob",
"nameRaw": "Alice and Bob"
}
],
"inLanguage": "en",
"mainImage": {"url": "http://example.com/item-1/image1.jpeg"
},
"images": [{"url": "http://example.com/item-1/image1.jpeg"
}
],
"metadata": {"probability": 0.34
}
}
],
"url": "https://example.com/articles/",
"metadata": {"dateDownloaded": "2019-06-19T08:27:43Z"
}
},
"articleNavigation": {"nextPage": {"url": "http://example.com/foo?p=3",
"name": "3"
},
"pageNumber": 2,
"items": [{"url": "https://example.com/articles/1/",
"name": "Article name",
"datePublished": "2019-06-19T00:00:00",
"datePublishedRaw": "June 19, 2019",
"metadata": {"probability": 0.34
}
}
],
"url": "https://example.com/articles/",
"metadata": {"dateDownloaded": "2019-06-19T08:27:43Z"
}
},
"forumThread": {"topic": {"name": "How do you cook rice?"
},
"posts": [{"text": "Cooking rice is a hobby of mine. Here is how I cook it.",
"datePublished": "2019-06-19T00:00:00",
"datePublishedRaw": "June 19, 2019",
"reactions": {"likes": 3,
"replies": 2
},
"metadata": {"probability": 0.34
}
}
],
"url": "https://example.com/forum/thread/1/",
"metadata": {"dateDownloaded": "2019-06-19T08:27:43Z"
}
},
"jobPosting": {"jobTitle": "Regional Manager",
"datePublished": "2019-06-19T00:00:00",
"datePublishedRaw": "19 June 2019",
"validThrough": "2019-08-20T00:00:00",
"description": "Job Description ...",
"descriptionHtml": "<article>HTML for Job Description ...",
"employmentType": "Full-time",
"hiringOrganization": {"name": "ACME Corp."
},
"baseSalary": {"raw": "$53,251 a year",
"valueMax": "53251.0",
"currency": "USD",
"currencyRaw": "$"
},
"jobLocation": {"raw": "West New York, NJ 07093"
},
"url": "https://example.com/job",
"metadata": {"probability": 0.87,
"dateDownloaded": "2019-06-19T08:27:43Z"
}
},
"jobPostingNavigation": {"nextPage": {"url": "http://example.com/foo?p=3",
"name": "3"
},
"pageNumber": 2,
"items": [{"url": "https://example.com/jobs/1/",
"name": "Job posting name",
"metadata": {"probability": 0.34
}
}
],
"url": "https://example.com/jobs/",
"metadata": {"dateDownloaded": "2019-06-19T08:27:43Z"
}
},
"product": {"name": "Product name",
"price": "149",
"currency": "USD",
"currencyRaw": "$",
"regularPrice": "199.00",
"availability": "InStock",
"sku": "A123DK9823",
"mpn": "code-123",
"gtin": [{"type": "isbn13",
"value": 9781933624341
}
],
"brand": {"name": "Product brand"
},
"breadcrumbs": [{"name": "Home",
"url": "https://example.com/"
},
{"name": "Cell Phones",
"url": "https://example.com/cell-phones"
},
{"name": "Cell Phones & Accessories"
}
],
"mainImage": {"url": "http://example.com/item-1/image1.jpeg"
},
"images": [{"url": "http://example.com/item-1/image1.jpeg"
}
],
"description": "product description",
"descriptionHtml": "<article>HTML description for Product ...",
"aggregateRating": {"ratingValue": 4,
"bestRating": 5,
"reviewCount": 24
},
"color": "Red",
"size": "XL",
"weight": {"value": 120,
"unit": "kilogram",
"rawUnit": "kg"
},
"material": "Metal, Plastic",
"style": "Striped",
"additionalProperties": [{"name": "batteries",
"value": "1 Lithium ion batteries required. (included)"
}
],
"features": ["Multi-System Compatible",
"HD Ready 1366 x 768 LED Panel",
"REFRESH RATE 100Hz PQI"
],
"url": "https://example.com/product/",
"canonicalUrl": "https://example.com/product/",
"metadata": {"probability": 0.87,
"dateDownloaded": "2019-06-19T08:27:43Z"
},
"variants": [{"name": "Product name",
"price": "149",
"currency": "USD",
"currencyRaw": "$",
"regularPrice": "199.00",
"availability": "InStock",
"sku": "A123DK9823",
"mpn": "code-123",
"gtin": [{"type": "isbn13",
"value": 9781933624341
}
],
"mainImage": {"url": "http://example.com/item-1/image1.jpeg"
},
"images": [{"url": "http://example.com/item-1/image1.jpeg"
}
],
"color": "Red",
"size": "XL",
"style": "Striped",
"additionalProperties": [{"name": "batteries",
"value": "1 Lithium ion batteries required. (included)"
}
],
"url": "https://example.com/product/",
"canonicalUrl": "https://example.com/product/"
}
]
},
"productList": {"breadcrumbs": [{"name": "Home",
"url": "https://example.com/"
},
{"name": "Cell Phones",
"url": "https://example.com/cell-phones"
},
{"name": "Cell Phones & Accessories"
}
],
"products": [{"url": "https://example.com/products/1/",
"name": "Product name",
"price": "149",
"currencyRaw": "$",
"currency": "USD",
"regularPrice": "199.00",
"mainImage": {"url": "http://example.com/item-1/image1.jpeg"
},
"metadata": {"probability": 0.34
}
}
],
"url": "https://example.com/products/",
"metadata": {"dateDownloaded": "2019-06-19T08:27:43Z"
},
"categoryName": "Sports & Outdoors"
},
"productNavigation": {"categoryName": "Sports & Outdoors",
"nextPage": {"url": "http://example.com/foo?p=3",
"name": "3"
},
"pageNumber": 2,
"items": [{"url": "https://example.com/products/1/",
"name": "Product name",
"metadata": {"probability": 0.34
}
}
],
"subCategories": [{"url": "https://example.com/category/1/",
"name": "Category name",
"metadata": {"probability": 0.34
}
}
],
"url": "https://example.com/products/",
"metadata": {"dateDownloaded": "2019-06-19T08:27:43Z"
}
},
"customAttributes": {"values": { },
"metadata": {"inputTokens": 0,
"outputTokens": 0,
"textInputTokens": 0,
"textInputTokensBeforeTruncation": 0,
"maxInputTokens": 0,
"excludedPIIAttributes": ["string"
],
"error": "string"
}
},
"echoData": { },
"jobId": "example-job-1",
"actions": [{"action": "waitForSelector",
"elapsedTime": 0,
"status": "success",
"error": "Request timeout while waiting for selector '#form-input'",
"interactionLogs": [{"time": "string",
"level": "debug",
"message": "string"
}
]
}
],
"responseCookies": [{"name": "string",
"value": "string",
"domain": "string",
"path": "string",
"expires": 0,
"httpOnly": true,
"secure": true,
"sameSite": "Strict"
}
],
"networkCapture": [{"interceptionStatus": {"status": "success",
"error": "string"
},
"statusCode": 0,
"httpResponseBody": "string",
"url": "http://example.com",
"headers": { },
"filter": {"filterType": "url",
"httpResponseBody": false
},
"request": {"url": "string",
"headers": { },
"method": "string",
"body": "string"
}
}
],
"serp": {"organicResults": [{"description": "Squid is a caching proxy for the Web supporting HTTP, HTTPS, FTP, and more. It reduces bandwidth and improves response times by caching and reusing frequently- ...\n",
"name": "squid-cache.org",
"url": "https://www.squid-cache.org/",
"rank": 1
}
],
"url": "https://www.google.pl/search?q=squid+proxy",
"pageNumber": 1,
"metadata": {"displayedQuery": "squid proxy",
"searchedQuery": "squid proxy",
"totalOrganicResults": 10000,
"dateDownloaded": "2024-02-29T13:01:54Z"
},
"product": {"name": "Product name",
"price": "149",
"currency": "USD",
"currencyRaw": "$",
"regularPrice": "199.00",
"availability": "InStock",
"sku": "A123DK9823",
"mpn": "code-123",
"gtin": [{"type": "isbn13",
"value": 9781933624341
}
],
"brand": {"name": "Product brand"
},
"breadcrumbs": [{"name": "Home",
"url": "https://example.com/"
},
{"name": "Cell Phones",
"url": "https://example.com/cell-phones"
},
{"name": "Cell Phones & Accessories"
}
],
"mainImage": {"url": "http://example.com/item-1/image1.jpeg"
},
"images": [{"url": "http://example.com/item-1/image1.jpeg"
}
],
"description": "product description",
"descriptionHtml": "<article>HTML description for Product ...",
"aggregateRating": {"ratingValue": 4,
"bestRating": 5,
"reviewCount": 24
},
"color": "Red",
"size": "XL",
"weight": {"value": 120,
"unit": "kilogram",
"rawUnit": "kg"
},
"material": "Metal, Plastic",
"style": "Striped",
"additionalProperties": [{"name": "batteries",
"value": "1 Lithium ion batteries required. (included)"
}
],
"features": ["Multi-System Compatible",
"HD Ready 1366 x 768 LED Panel",
"REFRESH RATE 100Hz PQI"
],
"url": "https://example.com/product/",
"canonicalUrl": "https://example.com/product/",
"metadata": {"probability": 0.87,
"dateDownloaded": "2019-06-19T08:27:43Z"
},
"variants": [{"name": "Product name",
"price": "149",
"currency": "USD",
"currencyRaw": "$",
"regularPrice": "199.00",
"availability": "InStock",
"sku": "A123DK9823",
"mpn": "code-123",
"gtin": [{"type": "isbn13",
"value": 9781933624341
}
],
"mainImage": {"url": "http://example.com/item-1/image1.jpeg"
},
"images": [{"url": "http://example.com/item-1/image1.jpeg"
}
],
"color": "Red",
"size": "XL",
"style": "Striped",
"additionalProperties": [{"name": "batteries",
"value": "1 Lithium ion batteries required. (included)"
}
],
"url": "https://example.com/product/",
"canonicalUrl": "https://example.com/product/"
}
]
}
}
}