Warning

Zyte Automatic Extraction will be discontinued starting April 30th, 2024. It is replaced by Zyte API. See Migrating from Automatic Extraction to Zyte API.

Comment Extraction#

Comment extraction supports pages with comments, usually under a blog post or a news article. All comments on a page are returned, including fields such as comment text and it’s publication date.

This supports use-cases such as news and media monitoring, analytics, brand monitoring, mentions, sentiment analysis and many others.

Related page type is Forum Post Extraction which supports extraction of posts made under a single topic on a forum.

Request example#

If you requested a comment extraction, and the extraction succeeds, then the comments field will be available in the query result:

from autoextract.sync import request_raw

query = [{
    'url': 'https://example.com/article-with-comments',
    'pageType': 'comments'
}]
results = request_raw(query, api_key='[api key]')
print(results[0]['comments'])

Available fields#

Top-level#

The following fields are available for comments:

url: string

URL of a page where the comments were extracted.

comments: list of dictionaries

List of comments; fields are described below.

url field is required.

Individual comments#

Each comment inside comments field has the following fields available:

text: string

Text of the comment.

datePublished: string

Comment date. ISO-formatted with ‘T’ separator, may contain a timezone.

datePublishedRaw: string

Same as datePublished, but before parsing/normalization, i.e. as it appeared on the site.

upvoteCount: integer

Number of up-votes recieved by the comment.

downvoteCount: integer

Number of down-votes recieved by the comment.

probability: float

Probability that this is a comment.

Comments refer to an article/blog-post available on the same page.

All fields are optional, except for probability. Fields without a valid value (null or empty array) are excluded from extraction results.

Response example#

Below is an example response with all comment fields present:

[
  {
    "comments": {
      "url": "https://example.com/article-with-comments",
      "comments": [
        {
          "text": "A comment on article",
          "datePublished": "2020-01-30T00:00:00",
          "datePublishedRaw": "Jan 30, 2020",
          "upvoteCount": 12,
          "downvoteCount": 1,
          "probability": 0.95
        },
        {
          "text": "Another comment",
          "probability": 0.95
        }
      ]
    },
    "webPage": {
      "inLanguages": [
        {"code": "en"},
        {"code": "es"}
      ]
    },
    "query": {
      "id": "1564747029122-9e02a1868d70b7a3",
      "domain": "example.com",
      "userQuery": {
        "pageType": "comments",
        "url": "https://example.com/article-with-comments"
      }
    },
    "algorithmVersion": "20.8.1"
  }
]