Unified Schema

Unified Schema

The Unified Schema project aims to provide a standard definition for the different types of data such as products, articles, reviews, jobs etc. extracted across websites.

Note: All fields in the AutoExtract have the exact same definition in the Unified Schema. We also aim to maintain backward compatibility while adding new fields. We also try our best to adhere to schema.org, only diverging when there is a reasonable benefit in doing so.

Article

Responses

200

Article

Response Schema: application/json
url
required
string <uri> (URL) ^http[s]{0,1}\:.*

Article page URL

articleBody
string

Text of the article, including sub-headings, with newline separators

articleBodyHtml
string

Simplified html of the article, including sub-headings, image captions and embedded content (videos, tweets, etc)

articleBodyRaw
string

html of the article body as seen in the source page

audioUrls
Array of strings <uri> (URL)

A list of URLs of all audio inside the article body

authors
Array of objects

Author of the article

breadcrumbs
Array of objects or objects

Article breadcrumbs

canonicalUrl
string <uri> (URL) ^http[s]{0,1}\:.*

Canonical URL of the article page

dateModified
string or string or string or string or string (String format is date or datetime)

The date when the article was most recently modified

dateModifiedRaw
string

The date when the article was most recently modified before parsing

datePublished
string or string or string or string or string (String format is date or datetime)

Publication date in ISO-format

datePublishedRaw
string

Publication date before parsing as appears on the website

description
string

A short summary of the article, human-provided if available, or auto-generated

headline
string

Article headline or title

images
Array of strings <uri> (URL)

Image urls of the article

mainImage
string <uri> (URL) ^http[s]{0,1}\:.*

A URL or data URL value of the main image of the article

videoUrls
Array of strings <uri> (URL)

A list of URLs of all videos inside the article body

get /article
/article

Response samples

Content type
application/json
Copy
Expand all Collapse all
{}

Comment

Responses

200

Comment

Response Schema: application/json
pageUrl
required
string <uri> (URL) ^http[s]{0,1}\:.*

URL from where the comment is extracted ( in case different URLs between the page and comment )

author
object

Author of the comment

dateModified
string or string or string or string or string (String format is date or datetime)

The date when the comment was most recently modified

dateModifiedRaw
string

The date when the comment was most recently modified before parsing

datePublished
string or string or string or string or string (String format is date or datetime)

Publication date in ISO-format

datePublishedRaw
string

Publication date before parsing as appears on the website

downvoteCount
number

The number of downvotes this comment received

edited
boolean

Whether comment was edited

identifier
string

To what “parentIdentifier” refers to

locationCreated
object (Postal Address)

The location where the comment was created

parentIdentifier
string

The parent of the comment

replyCount
number

The number of answers this comment has received.

text
string

text (body) of the comment

textHtml
string

Cleaned up HTML of the comment body

textRaw
string

HTML of the comment body

upvoteCount
number

The number of upvotes this comment received

url
string <uri> (URL) ^http[s]{0,1}\:.*

URL of the comment

get /comment
/comment

Response samples

Content type
application/json
Copy
Expand all Collapse all
{
}

Job Posting

Responses

200

Job Posting

Response Schema: application/json
url
required
string <uri> (URL) ^http[s]{0,1}\:.*

Job Posting page URL

baseSalary
object (Monetary Amount)

The base salary of the job or of an employee

datePosted
string or string or string or string or string (String format is date or datetime)

Publication date for the job posting

datePostedRaw
string

Publication date for the job posting

description
string

A description of job posting

employmentType
string

Type of employment (e.g. full-time, part-time, contract, temporary, seasonal, internship)

hiringOrganization
object (Organization)

Organization offering the job position

jobLocation
object (Postal Address)

A (typically single) geographic location associated with the job position

title
string

The title of the job

validThrough
string or string or string or string or string (String format is date or datetime)

The date after when the job posting is not valid

validThroughRaw
string

The date after when the job posting is not valid.

get /job-posting
/job-posting

Response samples

Content type
application/json
Copy
Expand all Collapse all
{
}

Product

Responses

200

Product

Response Schema: application/json
url
required
string <uri> (URL) ^http[s]{0,1}\:.*

The URL of the product

additionalProperty
Array of objects (A generic name:value field)

This name-value pair field holds information pertaining to product specific features that have no matching property in the Product schema.

aggregateRating
object or object or object

The overall rating, based on a collection of reviews or ratings

brand
string

The brand associated with the product

"Samsung"

No brand is returned

breadcrumbs
Array of objects or objects

A list of breadcrumbs with optional name and URL.

color
string

The color of the product

depth
object or object or object (A generic quantitative value)

The depth of the product

description
string

A description of the product

gtin
Array of objects

Standardized GTIN product identifier which is unique for a product across different sellers.

height
object or object or object (A generic quantitative value)

The height of the product

images
Array of strings <uri> (URL)

A list of URL or data URL values of all images of the product (may include the main image).

madeIn
string

The city or country where the product has been manufactured. The website should explicitly carry wording to disambiguate this from product location

mainImage
string <uri> (URL) ^http[s]{0,1}\:.*

A URL or data URL value of the main image of the product.

manufacturer
string

The manufacturer company of the product. The difference between brand and manufacturer is difficult to stablish, so this field should only be included when the description appear explicitly on the website, otherwise, brand field is prefered over manufacturer

mpn
string

The Manufacturer Part Number (MPN) of the product. The product would have the same MPN across different e-commerce websites.

name
string

The name of the product

nutrition
Array of objects

Nutritional information about the product

offers
Array of objects (Offer)

This field contains rich information pertaining to all the buying options offered on a product.

productionDate
string or string or string or string or string (String format is date or datetime)

The date of production of the item

productionDateRaw
string

The date of production of the item as it appears on the website

rankings
Array of objects (Ranking)

Position of the product across different ranks

ratingHistogram
Array of objects

Distribution of ratings across the entire rating scale

relatedProducts
Array of objects

This field captures all products that are recommended by the website while browsing the product of interest. Related products can thus be used to gauge customer buying behaviour, sponsored products as well best sellers in the same category. The relationshipName field describes the relationship while the products field contains a list of items have the same Product schema, thus extracting all available fields as defined in this table.

releaseDate
string or string or string or string or string (String format is date or datetime)

Date on which the product was released or listed on the website in ISO 8601 date format

releaseDateRaw
string

Date on which the product was released or listed on the website

reviews
Array of objects (Review)

Product Reviews

size
string

Denotes the size of the product. Pertinent to products such as garments, shoes, accessories etc

sku
string

The Stock Keeping Unit (SKU) i.e. a merchant-specific identifier for the product

variants
Array of objects

This field returns a list of variants of the product. Each variant has the same schema as the Product schema defined in this table.

volume
object or object or object (A generic quantitative value)

The volume of the product

weight
object or object or object (A generic quantitative value)

The weight of the product

width
object or object or object (A generic quantitative value)

The width of the product

get /product
/product

Response samples

Content type
application/json
Copy
Expand all Collapse all
{