Zyte API reference documentation#
foo
, you base64-encode foo:
as Zm9vOg==
Authorization
header with value Basic Zm9vOg==
.Authorization: Basic Zm9vOg==
Tip
Examples for different programming languages in the usage documentation feature authentication.
Web Data Extraction API (1.0.0)
Download OpenAPI specification:Download
A single API for web scraping
Process a single URL, return the result
Process a single URL, return the result.
This endpoint blocks until the result is ready. It is intended for short-running operations.
At least one of the following request fields must be set to true:
- browserHtml
- httpResponseBody
- httpResponseHeaders
- screenshot
- An automatic extraction request field:
All automatic extraction data types except for
serp
support performing extraction using either a browser request or an HTTP
request. Choose which using the corresponding extractFrom
option,
e.g.
productOptions.extractFrom
when extracting a product.
When no option is specified, currently automatic extraction defaults to using a browser request. In the future, however, the default value may depend on the target website.
When automatic extraction uses a browser request, it can be combined with any fields compatible with browserHtml, e.g. screenshot. When automatic extraction uses an HTTP request, it can be combined with any fields compatible with httpResponseBody. serp does not support any extra fields.
You cannot combine multiple automatic extraction request fields (e.g. product and productList) on the same request.
You cannot combine httpResponseBody with a request field that is exclusive of browser requests (e.g. httpResponseBody and browserHtml).
httpResponseHeaders can be requested alone or with any other valid combination of request fields except for serp.
The request body size limit is 5MiB.
Authorizations:
Request Body schema: application/json
url required | string <= 8192 characters An absolute URL to extract data from. The host name must be a domain name, it cannot be an IP address. | ||||||||||||||||||||||||
object (RequestHeaders) HTTP request headers. Can only be used in a browser request. For HTTP requests, see customHttpRequestHeaders. At the moment it only supports the Referer header. | |||||||||||||||||||||||||
| |||||||||||||||||||||||||
object or null Request specific attribute used for tagging so that every request can be attributed to a specific organization parameters. | |||||||||||||||||||||||||
| |||||||||||||||||||||||||
ipType | string Enum: "datacenter" "residential" Type of IP address from which the request should be sent. If not specified, Zyte API will use an IP type that, for the target website, does not cause bans or unexpected response data. If you believe Zyte API is using the wrong default IP type for a website, please open a support ticket. | ||||||||||||||||||||||||
httpRequestMethod | string Enum: "GET" "POST" "PUT" "DELETE" "OPTIONS" "TRACE" "PATCH" "HEAD" Request HTTP method. Can only be used in combination with httpResponseBody. See an example. See also: httpRequestText, httpRequestBody, customHttpRequestHeaders, httpResponseHeaders. | ||||||||||||||||||||||||
httpRequestBody | string <byte> <= 400000 characters Base64-encoded data to send as request body. Can only be used in combination with httpResponseBody. It usually needs to be used in combination with httpRequestMethod. If you only need to send UTF-8-encoded text, use httpRequestText instead to skip Base64-encoding. Note that you cannot combine both fields on the same request. See an example. See also: customHttpRequestHeaders. | ||||||||||||||||||||||||
httpRequestText | string [ 1 .. 400000 ] characters UTF-8 text to send as request body. Can only be used in combination with httpResponseBody. It usually needs to be used in combination with httpRequestMethod. If you need to send a binary or non-UTF-8 request body, use httpRequestBody instead. Note that you cannot combine both fields on the same request. See an example. See also: customHttpRequestHeaders. | ||||||||||||||||||||||||
Array of objects (CustomHttpRequestHeader) <= 200 items [ items ] HTTP request headers. Can only be used in combination with httpResponseBody. To set headers with other outputs, see requestHeaders. Setting HTTP request headers has some caveats:
See an example. See also: httpRequestMethod, httpRequestText, httpRequestBody, httpResponseHeaders. | |||||||||||||||||||||||||
Array (<= 200 items)
| |||||||||||||||||||||||||
httpResponseBody | boolean Default: false Set to This field is not compatible with browser automation. See an example. See also: httpRequestMethod, httpRequestText, httpRequestBody, customHttpRequestHeaders. | ||||||||||||||||||||||||
httpResponseHeaders | boolean Default: false Set to See an example. See also: customHttpRequestHeaders, requestHeaders. | ||||||||||||||||||||||||
browserHtml | boolean Default: false Set to This field is not compatible with HTTP requests. If you use actions, the browser HTML is generated after action execution has finished or timed out. See an example. See also: screenshot, requestHeaders. | ||||||||||||||||||||||||
screenshot | boolean Default: false Set to This field is not compatible with HTTP requests. To adjust the screenshot contents you can use screenshotOptions and viewport. If you use actions, the screenshot is generated after action execution has finished or timed out. See an example. See also: browserHtml, requestHeaders. | ||||||||||||||||||||||||
object (ScreenshotOptions) Options for the screenshot taken when the
screenshot
request field is | |||||||||||||||||||||||||
| |||||||||||||||||||||||||
article | boolean Default: false Set to The target page should only contain a single article, such as a blog post or a news article. For pages with multiple articles consider using articleList instead. To combine this field with
HTTP requests,
set
articleOptions.extractFrom
to If you use actions, data extraction happens after action execution has finished or timed out. See also: List of all automatic extraction request fields, articleNavigation, browserHtml, screenshot, requestHeaders. | ||||||||||||||||||||||||
object (ExtractionOptions) Options for datatype extraction. | |||||||||||||||||||||||||
| |||||||||||||||||||||||||
articleList | boolean Default: false Set to The target page should contain multiple articles, usually as links or short snippets. Examples of such pages are main or category pages of news sites, main pages of blogs showing multiple posts, and other pages with multiple articles. Article list data is especially useful to get basic information about articles on a website, like a headline and a link to the article details, using a smaller number of requests, when article attributes are extracted directly from a article list page, without making individual article requests. To implement article crawling from article list pages, use articleNavigation, which also enables navigation through pagination links. To combine this field with
HTTP requests,
set
articleListOptions.extractFrom
to If you use actions, data extraction happens after action execution has finished or timed out. See also: List of all automatic extraction request fields, browserHtml, screenshot, requestHeaders. | ||||||||||||||||||||||||
object (ExtractionOptions) Options for datatype extraction. | |||||||||||||||||||||||||
| |||||||||||||||||||||||||
articleNavigation | boolean Default: false Set to The target page should contain multiple articles and/or subcategories that can be followed. Article navigation data is especially useful for implementing article crawling, i.e. following links to article pages, as well as to subcategories and pagination that can in turn link to more article pages. Article navigation data can also be used to get basic information of articles and subcategories on a website, obtaining the URLs and link names of the articles and subcategories, without making individual requests for those articles. To combine this field with
HTTP requests,
set
articleNavigationOptions.extractFrom
to If you use actions, data extraction happens after action execution has finished or timed out. See also: List of all automatic extraction request fields, article, articleList, browserHtml, screenshot, requestHeaders. | ||||||||||||||||||||||||
object (ExtractionOptions) Options for datatype extraction. | |||||||||||||||||||||||||
| |||||||||||||||||||||||||
forumThread | boolean Default: false Set to The target page should contain an individual forum thread page on a forum website. To combine this field with
HTTP requests,
set
forumThread.extractFrom
to See also: List of all automatic extraction request fields, article, browserHtml, screenshot, requestHeaders. | ||||||||||||||||||||||||
object (ExtractionOptions) Options for datatype extraction. | |||||||||||||||||||||||||
| |||||||||||||||||||||||||
jobPosting | boolean Default: false Set to The target page should contain individual job posting page on a company website or on a job website. To combine this field with
HTTP requests,
set
jobPostingOptions.extractFrom
to If you use actions, data extraction happens after action execution has finished or timed out. See also: List of all automatic extraction request fields, browserHtml, screenshot, requestHeaders. | ||||||||||||||||||||||||
object (ExtractionOptions) Options for datatype extraction. | |||||||||||||||||||||||||
| |||||||||||||||||||||||||
jobPostingNavigation | boolean Default: false Set to The target page should contain multiple job postings and/or subcategories that can be followed. Job posting navigation data is especially useful for implementing job posting crawling, i.e. following links to job posting pages, as well as pagination that can in turn link to more job posting pages. Job posting navigation data can also be used to get basic information of job postings on a website, obtaining the URLs and link names of the job postings, without making individual requests for them. To combine this field with
HTTP requests,
set
jobPostingNavigationOptions.extractFrom
to If you use actions, data extraction happens after action execution has finished or timed out. See also: List of all automatic extraction request fields, jobPosting, browserHtml, screenshot, requestHeaders. | ||||||||||||||||||||||||
object (ExtractionOptions) Options for datatype extraction. | |||||||||||||||||||||||||
| |||||||||||||||||||||||||
product | boolean Default: false Set to The target page should only contain a single product. For pages with multiple products consider using productList instead. To combine this field with
HTTP requests,
set
productOptions.extractFrom
to If you use actions, data extraction happens after action execution has finished or timed out. See an example. See also: List of all automatic extraction request fields, productNavigation, browserHtml, screenshot, requestHeaders. | ||||||||||||||||||||||||
object Additional options for product extraction. | |||||||||||||||||||||||||
| |||||||||||||||||||||||||
productList | boolean Default: false Set to The target page should contain a list or a grid of products. Product list data is especially useful to get basic information about products on a website using a smaller number of requests, when product attributes are extracted directly from a product list page, without making individual product requests. To implement product crawling from product list pages, use productNavigation, which also enables navigation through pagination links. To combine this field with
HTTP requests,
set
productListOptions.extractFrom
to If you use actions, data extraction happens after action execution has finished or timed out. See also: List of all automatic extraction request fields, browserHtml, screenshot, requestHeaders. | ||||||||||||||||||||||||
object (ExtractionOptions) Options for datatype extraction. | |||||||||||||||||||||||||
| |||||||||||||||||||||||||
productNavigation | boolean Default: false Set to The target page should contain multiple products and/or subcategories that can be followed. Product navigation data is especially useful for implementing product crawling, i.e. following links to product pages, as well as to subcategories and pagination that can in turn link to more product pages. Product navigation data can also be used to get basic information of products and subcategories on a website, obtaining the URLs and link names of the products and subcategories, without making individual requests for those products. To combine this field with
HTTP requests,
set
productNavigationOptions.extractFrom
to If you use actions, data extraction happens after action execution has finished or timed out. See also: List of all automatic extraction request fields, product, productList, browserHtml, screenshot, requestHeaders. | ||||||||||||||||||||||||
object (ExtractionOptions) Options for datatype extraction. | |||||||||||||||||||||||||
| |||||||||||||||||||||||||
object or null Schema of the custom attributes to extract. This is a subset of the OpenAPI specification, using JSON syntax. Zyte custom attributes extraction uses a Large Language Model (LLM) operated by Zyte to obtain any structured data specified by this schema from any unstructured web page. This allows to perform extraction similar to standard schemas, such as article or product, but much more flexibly. When this field is specified, the customAttributes.values field in the response would contain the extracted data. When custom attributes extraction is requested, a standard extraction field must also be specified (e.g. product). This determines the part of the web page which would be passed to the LLM for custom attributes extraction, e.g. when a web page is a product, we're only going to pass the product information, ignoring other parts of the page, such as menu or footer, which makes extraction cheaper and more accurate. See detailed documentation. Additionally, to see a request example, scroll up to the right-hand sidebar Request samples, and select “Extract Custom Attributes along with Article information” under Example. | |||||||||||||||||||||||||
| |||||||||||||||||||||||||
object Additional options for custom attributes extraction. | |||||||||||||||||||||||||
| |||||||||||||||||||||||||
geolocation | string (CountryCode) Enum: "AW" "AF" "AO" "AI" "AX" "AL" "AD" "AE" "AR" "AM" "AS" "AQ" "TF" "AG" "AU" "AT" "AZ" "BI" "BE" "BJ" "BQ" "BF" "BD" "BG" "BH" "BS" "BA" "BL" "BY" "BZ" "BM" "BO" "BR" "BB" "BN" "BT" "BV" "BW" "CF" "CA" "CC" "CH" "CL" "CN" "CI" "CM" "CD" "CG" "CK" "CO" "KM" "CV" "CR" "CU" "CW" "CX" "KY" "CY" "CZ" "DE" "DJ" "DM" "DK" "DO" "DZ" "EC" "EG" "ER" "EH" "ES" "EE" "ET" "FI" "FJ" "FK" "FR" "FO" "FM" "GA" "GB" "GE" "GG" "GH" "GI" "GN" "GP" "GM" "GW" "GQ" "GR" "GD" "GL" "GT" "GF" "GU" "GY" "HK" "HM" "HN" "HR" "HT" "HU" "ID" "IM" "IN" "IO" "IE" "IR" "IQ" "IS" "IL" "IT" "JM" "JE" "JO" "JP" "KZ" "KE" "KG" "KH" "KI" "KN" "KR" "KW" "LA" "LB" "LR" "LY" "LC" "LI" "LK" "LS" "LT" "LU" "LV" "MO" "MF" "MA" "MC" "MD" "MG" "MV" "MX" "MH" "MK" "ML" "MT" "MM" "ME" "MN" "MP" "MZ" "MR" "MS" "MQ" "MU" "MW" "MY" "YT" "NA" "NC" "NE" "NF" "NG" "NI" "NU" "NL" "NO" "NP" "NR" "NZ" "OM" "PK" "PA" "PN" "PE" "PH" "PW" "PG" "PL" "PR" "KP" "PT" "PY" "PS" "PF" "QA" "RE" "RO" "RU" "RW" "SA" "SD" "SN" "SG" "GS" "SH" "SJ" "SB" "SL" "SV" "SM" "SO" "PM" "RS" "SS" "ST" "SR" "SK" "SI" "SE" "SZ" "SX" "SC" "SY" "TC" "TD" "TG" "TH" "TJ" "TK" "TM" "TL" "TO" "TT" "TN" "TR" "TV" "TW" "TZ" "UG" "UA" "UM" "UY" "US" "UZ" "VA" "VC" "VE" "VG" "VI" "VN" "VU" "WF" "WS" "YE" "ZA" "ZM" "ZW" ISO 3166-1 alpha-2 code of a country from which the request should be sent, i.e. the request geolocation. If not specified, Zyte API will use a geolocation that, for the target website, does not cause bans or unexpected locale changes in the response data, such as the wrong language, currency, date format, time zone, etc. If you believe Zyte API is using the wrong default geolocation for a website, please open a support ticket. For some websites, however, you might want to set a custom geolocation. For example, you may be interested in visiting the same URL from different locations. Zyte API provides 2 sets of geolocations. Standard geolocations are
| ||||||||||||||||||||||||
javascript | boolean Forces JavaScript execution on a
browser request
to be enabled ( By default Zyte API enables or disables JavaScript execution for a request depending on which option makes it easier to avoid bans. Use this request field to override that choice. Passing this request field when requesting automatic extraction ( product, article, etc.) may impact the quality of the returned data, as it might override the optimal value for automatic extraction. This field is not compatible with HTTP requests. | ||||||||||||||||||||||||
Array of click (object) or doubleClick (object) or evaluate (object) or goto (object) or hide (object) or hover (object) or interaction (object) or keyPress (object) or reload (object) or scrollBottom (object) or scrollTo (object) or searchKeyword (object) or select (object) or setLocation (object) or type (object) or waitForNavigation (object) or waitForRequest (object) or waitForResponse (object) or waitForSelector (object) or waitForTimeout (object) (ActionSequence) [ items ] Sequence of browser actions to execute. Select an action below to see its API reference. When using actions, you get the actions response field with debug information about action execution. | |||||||||||||||||||||||||
Array One of
| |||||||||||||||||||||||||
jobId | string <= 100 characters ID of the Scrapy Cloud job from which this request has been sent, to be returned in the jobId response field. This field is meant to help with request tracking. scrapy-zyte-api fills this request field automatically. See an example. See also: echoData. | ||||||||||||||||||||||||
echoData | any This field is returned in the echoData response field, verbatim. This field can be useful, for example, to keep track of the original request order when sending multiple requests in parallel. The request can be rejected if the data is too big. See an example. See also: jobId. | ||||||||||||||||||||||||
object (Viewport) | |||||||||||||||||||||||||
| |||||||||||||||||||||||||
followRedirect | boolean Whether to follow HTTP redirection or not. Only supported in HTTP requests, browser requests always follow redirection. | ||||||||||||||||||||||||
Array of objects (SessionContext) [ items <= 10 items ] User-defined name-value pairs to request a server-managed session initialized with sessionContextParameters). For every subsequent request with the same session context, Zyte API will either reuse an available session created for the same session context or create a new session using sessionContextParameters). Server-managed sessions expire after 4 hours or 3 ban responses. If you are targeting websites that silently expire their sessions before the 4-hour mark, i.e. they revert the effects of your sessionContextParameters but requests continue working as expected otherwise, consider using client-managed sessions for higher session control. See an example. See also: requestCookies, responseCookies. | |||||||||||||||||||||||||
Array
| |||||||||||||||||||||||||
object (SessionContextParameters) Parameters to create a server-managed session for a given sessionContext). See an example. See also: actions. | |||||||||||||||||||||||||
| |||||||||||||||||||||||||
object (Session) Parameters to create or reuse a client-managed session. If Client-managed sessions may expire due to any of the following:
For 5-10 minutes after a session expires, Zyte API keeps track of the expired session and does not allow re-using it. After that time, attempts to reuse the session will instead create a new session. | |||||||||||||||||||||||||
| |||||||||||||||||||||||||
Array of objects (NetworkCaptureFilterSequence) <= 10 items [ items ] Filters to capture browser network responses. HTTP responses received during browser rendering (including action execution) will be returned in the networkCapture response field if they match any of the filters defined here. You can capture up to 10 responses, provided the sum of their bodies does not exceed 5 MiB. If they do exceed that limit, only the first captured responses within the limit are returned. | |||||||||||||||||||||||||
Array (<= 10 items)
| |||||||||||||||||||||||||
device | string Enum: "desktop" "mobile" Type of device to emulate during your request. A desktop device is emulated by default. Can only be used in combination with httpResponseBody. | ||||||||||||||||||||||||
cookieManagement | any Default: "auto" Enum: "auto" "discard" Cookie management method It determines how to handle user cookies, defined through
requestCookies,
and automatic cookies, cookies automatically generated by Zyte API.
| ||||||||||||||||||||||||
Array of objects (Cookie) <= 100 items [ items ] A list of cookies to be sent with a request. You can use the contents of the responseCookies response field as a value for this request field. The size of each cookie object cannot be greater than 2048, as measured by the sum of the length of the cookie name, value, and attributes. | |||||||||||||||||||||||||
Array (<= 100 items)
| |||||||||||||||||||||||||
responseCookies | boolean Default: false Set to See an example. See also: requestCookies. | ||||||||||||||||||||||||
serp | boolean Set to The target URL should be a search URL that belongs to a Google domain. Currently, you cannot combine this field with any other request fields besides url. | ||||||||||||||||||||||||
object (SerpOptions) Options for SERP extraction. | |||||||||||||||||||||||||
|
Responses
Response Schema: application/json
url required | string URL the data was extracted from. Could be different from the input URL in case of redirection. See also: statusCode. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
statusCode | integer The HTTP status code retrieved from the target page. If redirection is followed, this is the status code of the response after redirection. See also: url. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
httpResponseBody | string <byte> Base64-encoded HTTP response body. To get this response field, set the
httpResponseBody
request field to Unlike browserHtml, this field supports binary response bodies, such as image files or PDF files. This is the reason why this field is Base64-encoded, JSON does not support binary data. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Array of objects (HTTPHeader) [ items ] HTTP response headers. To get this response field, set the
httpResponseHeaders
request field to The The | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Array
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
browserHtml | string To get this response field, set the
browserHtml
request field to Browser HTML does not include the contents of iframes or the shadow DOM. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
object (Session) Parameters to create or reuse a client-managed session. If Client-managed sessions may expire due to any of the following:
For 5-10 minutes after a session expires, Zyte API keeps track of the expired session and does not allow re-using it. After that time, attempts to reuse the session will instead create a new session. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
screenshot | string <byte> Base64-encoded page screenshot file data. To get this response field, set the
screenshot
request field to screenshotOptions.format determines the file format of the screenshot data. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
object Article data. To get this response field, set the
article
request field to | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
object Article list data. To get this response field, set the
articleList
request field to | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
object Article navigation data. To get this response field, set the
articleNavigation
request field to | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
object Forum thread data. To get this response field, set the
forumThread
request field to | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
object Job posting data. To get this response field, set the
jobPosting
request field to | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
object Job posting navigation data. To get this response field, set the
jobPostingNavigation
request field to | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
object Product data. To get this response field, set the
product
request field to | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
object Product list data. To get this response field, set the
productList
request field to | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
object Product navigation data. To get this response field, set the
productNavigation
request field to | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
object | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
echoData | object Arbitrary data set on the echoData request field. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
jobId | string <= 100 characters Scrapy Cloud job ID set on the jobId request field. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Array of objects (ActionResult) [ items ] Debug information about the execution of the action sequence set in the actions request field. Action order in the response always matches that of the request. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Array
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Array of objects (Cookie) [ items ] List of cookies set during the request. To get this response field, set the
responseCookies
request field to See an example. See also: requestCookies. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Array
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Array of objects (CapturedResponse) [ items ] Responses captured by filters specified in the networkCapture request parameter. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Array
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
object (SearchResultsPage) Search engine results page data. To get this response field, set the
serp
request field to | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Request samples
- Payload
{- "httpResponseBody": true
}
Response samples
- 200
- 400
- 401
- 403
- 422
- 429
- 451
- 500
- 503
- 520
- 521
- default
{- "statusCode": 200,
- "httpResponseBody": "string",
- "httpResponseHeaders": [
- {
- "name": "Content-Type",
- "value": "text/html; charset=utf-8"
}
], - "browserHtml": "<html>Downloaded data.</html>",
- "session": {
- "id": "ab837d21-f848-42b2-8e88-47ea9d84bad0"
}, - "screenshot": "string",
- "article": {
- "headline": "Article headline",
- "articleBody": "Article body ...",
- "articleBodyHtml": "<article><p>Article body ... </p> ... </article>",
- "description": "Article summary",
- "datePublished": "2019-06-19T00:00:00",
- "datePublishedRaw": "June 19, 2019",
- "dateModified": "2019-06-21T00:00:00",
- "dateModifiedRaw": "June 21, 2019",
- "authors": [
- {
- "name": "Alice",
- "nameRaw": "Alice and Bob"
}, - {
- "name": "Bob",
- "nameRaw": "Alice and Bob"
}
], - "inLanguage": "en",
- "breadcrumbs": [
- {
- "name": "Cell Phones & Accessories"
}
], - "metadata": {
- "probability": 0.87,
- "dateDownloaded": "2019-06-19T08:27:43Z"
}
}, - "articleList": {
- "articles": [
- {
- "headline": "Article headline",
- "articleBody": "Article body ...",
- "datePublished": "2019-06-19T00:00:00",
- "datePublishedRaw": "June 19, 2019",
- "authors": [
- {
- "name": "Alice",
- "nameRaw": "Alice and Bob"
}, - {
- "name": "Bob",
- "nameRaw": "Alice and Bob"
}
], - "inLanguage": "en",
- "metadata": {
- "probability": 0.34
}
}
], - "metadata": {
- "dateDownloaded": "2019-06-19T08:27:43Z"
}
}, - "articleNavigation": {
- "pageNumber": 2,
- "items": [
- {
- "name": "Article name",
- "datePublished": "2019-06-19T00:00:00",
- "datePublishedRaw": "June 19, 2019",
- "metadata": {
- "probability": 0.34
}
}
], - "metadata": {
- "dateDownloaded": "2019-06-19T08:27:43Z"
}
}, - "forumThread": {
- "topic": {
- "name": "How do you cook rice?"
}, - "posts": [
- {
- "text": "Cooking rice is a hobby of mine. Here is how I cook it.",
- "datePublished": "2019-06-19T00:00:00",
- "datePublishedRaw": "June 19, 2019",
- "reactions": {
- "likes": 3,
- "replies": 2
}, - "metadata": {
- "probability": 0.34
}
}
], - "metadata": {
- "dateDownloaded": "2019-06-19T08:27:43Z"
}
}, - "jobPosting": {
- "jobTitle": "Regional Manager",
- "datePublished": "2019-06-19T00:00:00",
- "datePublishedRaw": "19 June 2019",
- "validThrough": "2019-08-20T00:00:00",
- "description": "Job Description ...",
- "descriptionHtml": "<article>HTML for Job Description ...",
- "employmentType": "Full-time",
- "hiringOrganization": {
- "name": "ACME Corp."
}, - "baseSalary": {
- "raw": "$53,251 a year",
- "valueMax": "53251.0",
- "currency": "USD",
- "currencyRaw": "$"
}, - "jobLocation": {
- "raw": "West New York, NJ 07093"
}, - "metadata": {
- "probability": 0.87,
- "dateDownloaded": "2019-06-19T08:27:43Z"
}
}, - "jobPostingNavigation": {
- "pageNumber": 2,
- "items": [
], - "metadata": {
- "dateDownloaded": "2019-06-19T08:27:43Z"
}
}, - "product": {
- "name": "Product name",
- "price": "149",
- "currency": "USD",
- "currencyRaw": "$",
- "regularPrice": "199.00",
- "availability": "InStock",
- "sku": "A123DK9823",
- "mpn": "code-123",
- "gtin": [
- {
- "type": "isbn13",
- "value": 9781933624341
}
], - "brand": {
- "name": "Product brand"
}, - "breadcrumbs": [
- {
- "name": "Cell Phones & Accessories"
}
], - "description": "product description",
- "descriptionHtml": "<article>HTML description for Product ...",
- "aggregateRating": {
- "ratingValue": 4,
- "bestRating": 5,
- "reviewCount": 24
}, - "color": "Red",
- "size": "XL",
- "style": "Striped",
- "additionalProperties": [
- {
- "name": "batteries",
- "value": "1 Lithium ion batteries required. (included)"
}
], - "features": [
- "Multi-System Compatible",
- "HD Ready 1366 x 768 LED Panel",
- "REFRESH RATE 100Hz PQI"
], - "metadata": {
- "probability": 0.87,
- "dateDownloaded": "2019-06-19T08:27:43Z"
}, - "variants": [
- {
- "name": "Product name",
- "price": "149",
- "currency": "USD",
- "currencyRaw": "$",
- "regularPrice": "199.00",
- "availability": "InStock",
- "sku": "A123DK9823",
- "mpn": "code-123",
- "gtin": [
- {
- "type": "isbn13",
- "value": 9781933624341
}
], - "color": "Red",
- "size": "XL",
- "style": "Striped",
- "additionalProperties": [
- {
- "name": "batteries",
- "value": "1 Lithium ion batteries required. (included)"
}
],
}
]
}, - "productList": {
- "breadcrumbs": [
- {
- "name": "Cell Phones & Accessories"
}
], - "products": [
- {
- "name": "Product name",
- "price": "149",
- "currencyRaw": "$",
- "currency": "USD",
- "regularPrice": "199.00",
- "metadata": {
- "probability": 0.34
}
}
], - "metadata": {
- "dateDownloaded": "2019-06-19T08:27:43Z"
}, - "categoryName": "Sports & Outdoors"
}, - "productNavigation": {
- "categoryName": "Sports & Outdoors",
- "pageNumber": 2,
- "items": [
], - "subCategories": [
- {
- "name": "Category name",
- "metadata": {
- "probability": 0.34
}
}
], - "metadata": {
- "dateDownloaded": "2019-06-19T08:27:43Z"
}
}, - "customAttributes": {
- "values": { },
- "metadata": {
- "inputTokens": 0,
- "outputTokens": 0,
- "textInputTokens": 0,
- "textInputTokensBeforeTruncation": 0,
- "maxInputTokens": 0,
- "excludedPIIAttributes": [
- "string"
], - "error": "string"
}
}, - "echoData": { },
- "jobId": "example-job-1",
- "actions": [
- {
- "action": "waitForSelector",
- "elapsedTime": 0,
- "status": "success",
- "error": "Request timeout while waiting for selector '#form-input'"
}
], - "responseCookies": [
- {
- "name": "string",
- "value": "string",
- "domain": "string",
- "path": "string",
- "expires": 0,
- "httpOnly": true,
- "secure": true,
- "sameSite": "Strict"
}
], - "networkCapture": [
- {
- "interceptionStatus": {
- "status": "success",
- "error": "string"
}, - "statusCode": 0,
- "httpResponseBody": "string",
- "headers": { },
- "filter": {
- "filterType": "url",
- "httpResponseBody": false
}, - "request": {
- "url": "string",
- "headers": { },
- "method": "string",
- "body": "string"
}
}
], - "serp": {
- "organicResults": [
- {
- "description": "Squid is a caching proxy for the Web supporting HTTP, HTTPS, FTP, and more. It reduces bandwidth and improves response times by caching and reusing frequently- ...\n",
- "name": "squid-cache.org",
- "rank": 1
}
], - "pageNumber": 1,
- "metadata": {
- "displayedQuery": "squid proxy",
- "searchedQuery": "squid proxy",
- "totalOrganicResults": 10000,
- "dateDownloaded": "2024-02-29T13:01:54Z"
}
}
}