Zyte Data article schema v1.0#
Standard Article Schema (1.0)
Standard Article Schema used in Zyte offering. Covers the typical set of attributes present in articles published on-line.
Standard Article Schema v1.0
Responses
Response Schema: application/json
headline | string Article headline or title. | ||||||||
datePublished | string Publication date of the article. Format: ISO 8601 format: "YYYY-MM-DDThh:mm:ssZ" or "YYYY-MM-DDThh:mm:ss±zz:zz". With timezone, if available. If the actual publication date is not found, "dateModified" value is taken. | ||||||||
datePublishedRaw | string Same date as "datePublished", but before parsing/normalization, i.e. as it appears on the website. | ||||||||
dateModified | string The date when the article was most recently modified. Format: ISO 8601 format: "YYYY-MM-DDThh:mm:ssZ" or "YYYY-MM-DDThh:mm:ss±zz:zz". With timezone, if available. | ||||||||
dateModifiedRaw | string Same date as "dateModified" but before parsing/normalization, i.e. as it appears on the website. | ||||||||
Array of objects[ items ] All authors of the article. | |||||||||
Array
| |||||||||
Array of objects or objects[ items ] The list of breadcrumbs with URL and optional category name. | |||||||||
Array Any of
| |||||||||
inLanguage | string Language of the article, as an ISO 639-1 language code. Sometimes article language is not the same as the web page overall language. | ||||||||
object (Image) The details of the main image of the article. | |||||||||
| |||||||||
Array of objects (Image) [ items ] A list of URL values of all images of the article. | |||||||||
Array
| |||||||||
description | string A short summary of the article. It can be either human-provided (if available), or auto-generated. | ||||||||
articleBody | string Clean text of the article, including sub-headings, with newline separators. Format:
| ||||||||
articleBodyHtml | string Simplified and standardized HTML of the article, including sub-headings, image captions and embedded content (videos, tweets, etc.). Format: HTML string normalized in a consistent way with internal algorithm. | ||||||||
Array of objects[ items ] A list of all videos inside the article body. | |||||||||
Array
| |||||||||
Array of objects[ items ] A list of all audios inside the article body. | |||||||||
Array
| |||||||||
canonicalUrl | string (URL) The canonical form of the URL, selected by the website. | ||||||||
url required | string (URL) The main URL of the article page. The URL of the final response, after any redirects. Required attribute. In case there is no article data on the page or the page was not reached, the returned "empty" item would still contain url field and metadata field with dateDownloaded. | ||||||||
object Contains metadata about the data extraction process. | |||||||||
|
Response samples
- 200
{- "headline": "Guarantee the best results for product data extraction",
- "datePublished": "2022-08-26T21:56:16+01:00",
- "datePublishedRaw": "August 26, 2022",
- "dateModified": "2022-08-27T22:48:55+01:00",
- "dateModifiedRaw": "August 25, 2022",
- "authors": [
- {
- "name": "Honorable Zytan",
- "nameRaw": "Honorable Zytan and Inquisitive Zytan",
- "email": "honorable.zytan@zyte.com",
}, - {
- "name": "Inquisitive Zytan",
- "nameRaw": "Honorable Zytan and Inquisitive Zytan",
- "email": "inquisitive.zytan@zyte.com",
}
], - "breadcrumbs": [
- {
- "name": "Guarantee the best results for product data extraction"
}
], - "inLanguage": "en",
- "mainImage": {
}, - "images": [
], - "description": "Product Data Extraction helps understand how consumers use a specific product, foresee improvements and adjustments that ultimately increase sales and demand.",
- "articleBody": "When businesses operate in a competitive environment it is imperative to know what their competitors are charging in real-time and this can be hard to keep track of. For any data driven organization, implementing a solution that automatically extracts product data from websites in real-time and at scale, is indispensable to stay ahead of the competition. Setting an automatic process for product data extraction can be a powerful tool for data driven businesses of all sizes. You can extract specific products information including offers, price, currency, and availability. Provided that the data extraction process is able to identify the key attributes of a product, it can then use this information to create reports with insights into the behavior of a particular product. For this to work, it is key to have the right procedures in place that guarantee the best results for your automation product extraction projects. By doing so, your organization can better understand how consumers are using a specific product, foresee any necessary adjustments to improve the user experience and as a result, increase sales and demand. In this article, we'll explain not just everything you need to know to get started, but also show you how to guarantee the best results when working with product data extraction.\n\nWhat is automatic product extraction?\n\n...",
- "articleBodyHtml": "<article>\n<p>When businesses operate in a competitive environment it is imperative to know what their competitors are charging in real-time and this can be hard to keep track of. For any data driven organization, implementing a solution that automatically extracts product data from websites in real-time and at scale, is indispensable to stay ahead of the competition.</p> <p>Setting an automatic process for <a href=\"https://docs.zyte.com/automatic-extraction/product.html\">product data extraction</a> can be a powerful tool for data driven businesses of all sizes. You can extract specific products information including offers, price, currency, and availability. Provided that the data extraction process is able to <a href=\"/blog/automatic-extraction-data-extractor-review/\">identify the key attributes of a product</a>, it can then use this information to create reports with insights into the behavior of a particular product.</p>\n<p>For this to work, it is key to have the right procedures in place that guarantee the best results for your automation product extraction projects. By doing so, your organization can better understand how consumers are using a specific product, foresee any necessary adjustments to improve the user experience and as a result, increase sales and demand.</p>\n<p>In this article, we'll explain not just everything you need to know to get started, but also show you how to guarantee the best results when working with product data extraction.</p>\n<h2><strong>What is automatic product extraction?</strong></h2>\n...\n</article>",
- "videos": [
], - "metadata": {
- "dateDownloaded": "2022-12-31T13:01:54Z",
- "probability": 1
}
}