Zyte Data product schema v1.0#

Standard Product Schema (1.0)

Standard Product Schema used in Zyte offering. Covers the typical set of product attributes used in common e-commerce data applications.

Standard Product Schema v1.0

Responses

Response Schema: application/json
availability
string
Enum: "InStock" "OutOfStock"

The availability status for the product.

color
string

The color of the product.

currency
string

The currency associated with the price, in ISO 4217 standard (e.g. USD).

currencyRaw
string

The currency associated with the price, as appears on the page (no post-processing).

productId
string

Product identifier, unique across dataset. It may come in the form of an SKU, any other identifier, a hash or even a URL.
Unique across dataset.

Array of objects[ items ]

List of standardized GTIN product identifiers associated with the product, which are unique for the product across different sellers.

Array
type
required
string
Enum: "gtin13" "gtin8" "gtin14" "isbn10" "isbn13" "ismn" "issn" "upc"

The type of product identifier.

value
required
string

The value of product identifier.

Format: - Normalized: only numerical characters allowed.

Array of objects (Image) [ items ]

A list of URL values of all images of the product. This does not take into account any images in the product description.
Best quality image, under the following conditions:
- if possible without making additional requests,
- if required, with additional parameters that return maximum size image,
- in other cases, the default image served for default resolution.
Should include the main image as the first in the list.
Data URLs are not allowed.

Array
url
required
string (URL)

A URL of an image

object (Image)

The details of the main image of the product.
Data URL is not allowed.

url
required
string (URL)

A URL of an image

mpn
string

The Manufacturer Part Number (MPN) of the product. The product would have the same MPN across different e-commerce websites.

name
string

The name of the product, as appears on the page (no post-processing).
Format: trimmed.

price
string

The price at which the product is being offered.
The value should be lower than regularPrice.
Format:
- no thousands separator,
- full stop as decimal separator.

regularPrice
string

The price at which the product was being offered and which is presented as a reference to the current price. It may be represented by original price, list price or maximum retail price for which the product is sold. This field is only returned if it is explicitly mentioned in the offer or the product page.
The value should be higher than price.
Format:
- no thousands separator,
- full stop as decimal separator.

size
string

Denotes the size or dimensions of the product. Pertinent to products such as garments, shoes, accessories etc.

sku
string

The Stock Keeping Unit (SKU), i.e. a merchant-specific identifier for the product.

style
string

Denotes the style of the product. Pertinent to products such as garments, shoes, accessories etc.

Array of objects (A generic name:value field) [ items ]

A name-value pair field holding information pertaining to specific features. Usually in a form of a specification table or freeform specification list.

Array
name
required
string
value
string
url
required
string (URL)

The main URL of the page where the product was extracted, after any redirects, but without canonicalization.

canonicalUrl
string (URL)

The canonical form of the URL, selected by the website.

object or object

The overall rating, based on a collection of reviews or ratings.

Any of
bestRating
number

The highest value allowed in this rating system. The value should not be lower than ratingValue.

ratingValue
required
number

The rating for the content. The value should not be higher than bestRating.

reviewCount
integer

The total number of reviews.

object

The details of the brand associated with the product.

name
required
string

Name of the brand

property name*
additional property
any
Array of objects or objects[ items ]

The list of breadcrumbs with URL and/or category name.
All levels of breadcrumbs should be included (e.g. "Home" or product name, if they are included in the breadcrumbs).

Array
Any of
string

Breadcrumb name or category name.

string (URL)

Breadcrumb link.

features
Array of strings

The list of product features, usually listed as bullet points.

description
string

Full main description of the product, containing the most useful pieces of information, if other pieces of description can be found on the page. It may contain data found in other fields (features, additionalProperties).
Format:
- trimmed (no whitespace at the beginning or the end of the description string),
- line breaks included,
- no length limit,
- no normalization of Unicode characters,
- no concatenation of description from different parts of the page.

descriptionHtml
string

The normalized HTML code of the product description.
Format:
- HTML string normalized in a consistent way with internal algorithm

Array of objects[ items ]

The list of the details of product variants.
Product variant is defined as another product with characteristics very close to the base product, displayed on the same page, with the ability to choose the product from a selection. Usually variants of a product differ in color, size or volume.
Which products are not considered variants:
- other products included in the same bundle
- product add-ons, e.g. premium upgrade of the base product
The properties apply to the variant product, not the base product.
Contains all of the attributes available for the base product, except:
- variants
- metadata
- brand
- description
- descriptionHtml
- features
- breadcrumbs
- aggregateRating
Those attributes are either harder to extract or frequently have the same contents across all variants. The data needs to be specific to the variant and available, it is not copied from the base product. Hidden variants (not displayed on the page) are not extracted by default.
Variant data needs to be obtainable with a minimal number of requests, with the first available variant selected as base product. Otherwise the data is not extracted.

Array
availability
string
Enum: "InStock" "OutOfStock"

The availability status for the product.

color
string

The color of the product.

currency
string

The currency associated with the price, in ISO 4217 standard (e.g. USD).

currencyRaw
string

The currency associated with the price, as appears on the page (no post-processing).

productId
string

Product identifier, unique across dataset. It may come in the form of an SKU, any other identifier, a hash or even a URL.
Unique across dataset.

Array of objects[ items ]

List of standardized GTIN product identifiers associated with the product, which are unique for the product across different sellers.

Array
type
required
string
Enum: "gtin13" "gtin8" "gtin14" "isbn10" "isbn13" "ismn" "issn" "upc"

The type of product identifier.

value
required
string

The value of product identifier.
Format: - Normalized: only numerical characters allowed.

Array of objects (Image) [ items ]

A list of URL values of all images of the product. This does not take into account any images in the product description.
Best quality image, under the following conditions:
- if possible without making additional requests,
- if required, with additional parameters that return maximum size image,
- in other cases, the default image served for default resolution.
Should include the main image as the first in the list.
Data URLs are not allowed.

Array
url
required
string (URL)

A URL of an image

object (Image)

The details of the main image of the product.
Data URL is not allowed.

url
required
string (URL)

A URL of an image

mpn
string

The Manufacturer Part Number (MPN) of the product. The product would have the same MPN across different e-commerce websites.

name
string

The name of the product, as appears on the page (no post-processing).
Format: trimmed.

price
string

The price at which the product is being offered.
The value should be lower than regularPrice.
Format:
- no thousands separator,
- full stop as decimal separator.

regularPrice
string

The price at which the product was being offered and which is presented as a reference to the current price. It may be represented by original price, list price or maximum retail price for which the product is sold. This field is only returned if it is explicitly mentioned in the offer or the product page.
The value should be higher than price.
Format:
- no thousands separator,
- full stop as decimal separator.

size
string

Denotes the size or dimensions of the product. Pertinent to products such as garments, shoes, accessories etc.

sku
string

The Stock Keeping Unit (SKU), i.e. a merchant-specific identifier for the product.

style
string

Denotes the style of the product. Pertinent to products such as garments, shoes, accessories etc.

Array of objects (A generic name:value field) [ items ]

A name-value pair field holding information pertaining to specific features. Usually in a form of a specification table or freeform specification list.

Array
name
required
string
value
string
url
string (URL)

The main URL of the page where the product was extracted, after any redirects, but without canonicalization.

canonicalUrl
string (URL)

The canonical form of the URL, selected by the website.

object

Contains metadata about the data extraction process.

dateDownloaded
string

The timestamp at which the product data was downloaded.
Timezone: UTC.
Format: ISO 8601 format. YYYY-MM-DDThh:mm:ssZ

probability
number [ 0 .. 1 ]

The probability that the page belongs to certain data type.

Response samples

Content type
application/json
{
}

Example image#

Product data annotated with schema properties

Example data#