Product List Extraction

Request example

If you requested a product list extraction, and the extraction succeeds, then the productList field will be available in the query result:

from autoextract.sync import request_raw

query = [{
    'url': 'http://books.toscrape.com/',
    'pageType': 'productList'
}]
results = request_raw(query, api_key='[api key]')
print(results[0]['productList'])

Available fields

Top-level

The following fields are available for productList:

url: string

URL of a page where products were extracted.

products: list of dictionaries

List of products. Individual fields are described below (see Individual products).

breadcrumbs: list of dictionaries with name and link optional string fields

A list of breadcrumbs (a specific navigation element) with optional name and URL. Example:

[
  {"name": "Foo", "link": "http://example.com/foo"},
  {"name": "Bar", "link": "http://example.com/foo/bar"}
]
paginationNext: dictionary

If pagination of a product list page is present, then paginationNext dictionary contains information about the link to the next page. Fields:

  • url is the URL of the next page. It is a required field.

  • text is the text corresponding to the link as it appears on site. Optional.

Example:

{"url": "http://example.com/foo?p=3", "text": "3"}
paginationPrevious: dictionary

If pagination of a product list page is present, then paginationPrevious dictionary contains information about the link to the previous page. Fields:

  • url is the URL of the next page. It is a required field.

  • text is the text corresponding to the link as it appears on site. Optional.

Example:

{"url": "http://example.com/foo?p=1", "text": "Prev"}

url field is required.

Individual products

Each product inside products field has the following fields:

name: string

The name of the product.

offers: list of dictionaries

Product offers. Each offer may contain price, currency, regularPrice and availability string fields. All fields are optional but currency is present only if price is also present.

  • price field is a string with a valid number (a dot is used as decimal separator). It is the price a customer has to pay after discounts or special offers.

  • currency is the currency as given on the website, without extra normalization (for example, both “$” and “USD” are possible currencies). It is present only if price is also present.

  • regularPrice is the price before any discount or special offer. It is present only when the price is different from regularPrice.

  • availability is the product availability, as a string. Allowed values:

    • "InStock" - includes limited availability, presale, preorder, and in-store only.

    • "OutOfStock" - includes discontinued and sold out.

Example:

[
  {
    "price": "42",
    "regularPrice": "45.00",
    "currency": "USD",
    "availability": "InStock"
  }
]
sku: string

Stock Keeping Unit identifier for the product assigned by the seller.

brand: string

Brand or manufacturer of the product.

mainImage: string

A URL or data URL value of the main image of the product.

images: list of strings

A list of URL or data URL values of all images of the product (may include the main image).

description: string

Description of the product.

aggregateRating: dictionary

Aggregate information about the product rating and reviews.

  • ratingValue is the average rating value, as a float.

  • bestRating is the best possible rating value, as a float.

  • reviewCount is the number of reviews or ratings for the product, as int.

Example - 4.5 out of 5, based on 12 reviews:

{
  "ratingValue": 4.5,
  "bestRating": 5,
  "reviewCount": 12
}

All fields are optional but one of reviewCount or ratingValue must be present.

probability: float

Probability that the extracted item is a single product listing.

url: string

URL a of the main product page for this product listing.

To get full information about the product you might make an AutoExtract request to this URL with pageType “product” (see Product Extraction).

All fields are optional, except for probability. Fields without a valid value (null or empty array) are excluded from extraction results.

Response example

Below is an example response with all product list fields present:

[
  {
    "productList":{
      "url":"http://example.com/product-list-page-3",
      "breadcrumbs":[
        {
          "name":"Home",
          "link":"http://example.com"
        }
      ],
      "paginationNext":{
        "text":"Next Page",
        "url":"http://example.com/product-list-page-4"
      },
      "paginationPrevious":{
        "text":"Previous Page",
        "url":"http://example.com/product-list-page-2"
      },
      "products":[
        {
          "name":"Product 1",
          "url":"http://example.com/product1",
          "offers":[
            {
              "price":"42",
              "currency":"USD",
              "availability":"InStock",
              "regularPrice":"60"
            }
          ],
          "sku":"product sku",
          "brand":"product1 brand",
          "mainImage":"http://example.com/image.png",
          "images":[
            "http://example.com/image.png"
          ],
          "description":"product1 description",
          "aggregateRating":{
            "ratingValue":4.5,
            "bestRating":5.0,
            "reviewCount":31
          },
          "probability":0.95
        },
        {
          "name":"Product 2",
          "url":"http://example.com/product2",
          "offers":[
            {
              "price":"72",
              "currency":"USD",
              "availability":"OutOfStock"
            }
          ],
          "sku":"product2 sku",
          "brand":"product2 brand",
          "mainImage":"http://example.com/image2.png",
          "images":[
            "http://example.com/image2.png"
          ],
          "description":"product2 description",
          "aggregateRating":{
            "ratingValue":1.5,
            "bestRating":5.0,
            "reviewCount":85
          },
          "probability":0.90
        }
      ]
    },
    "webPage": {
      "inLanguages": [
        {"code": "en"},
        {"code": "es"}
      ]
    },
    "query":{
      "id":"1564747029122-9e02a1868d70b7a2",
      "domain":"example.com",
      "userQuery":{
        "pageType":"productList",
        "url":"https://example.com/product-list-page"
      }
    },
    "algorithmVersion": "20.8.1"
  }
]