Forum Post Extraction#
Forum post extraction supports pages which contain multiple posts made on an internet forum page where a specific topic is discussed (thread). The API supports both “old-school” forums, and more modern discussion platforms. API response contains a list of all posts on the page, including the post text and publication date.
This supports use-cases such as media monitoring, analytics, brand monitoring, mentions, sentiment analysis and many others.
Related page type is Comment Extraction which supports comments under a single blog post or an article.
Request example#
If you requested a forum post extraction, and the extraction succeeds,
then the forumPosts
field will be available in the query result:
from autoextract.sync import request_raw
query = [{
'url': 'https://example.com/forum-post-page',
'pageType': 'forumPosts'
}]
results = request_raw(query, api_key='[api key]')
print(results[0]['forumPosts'])
Available fields#
Top-level#
The following fields are available for forumPosts
:
url
: stringURL of a page where posts were extracted.
topic
: dictionary withname
string fieldA dictionary with the name of the topic that is discussed on the page. Example:
{"name": "How do you cook rice?"}
posts
: list of dictionariesList of posts; fields are described below.
url
field is required.
Individual posts#
Each post inside posts
field has the following fields available:
text
: stringText of the post.
datePublished
: stringPost date. ISO-formatted with ‘T’ separator, may contain a timezone.
datePublishedRaw
: stringSame as
datePublished
, but before parsing/normalization, i.e. as it appeared on the site.replyCount
: integerNumber of replies recieved by the post.
upvoteCount
: integerNumber of up-votes recieved by the post.
probability
: floatProbability that this is a post.
Posts refer to the topic
extracted from the same page.
All fields are optional, except for probability
.
Fields without a valid value (null or empty array) are excluded from extraction results.
Response example#
Below is an example response with all forum post fields present:
[
{
"forumPosts": {
"url": "https://example.com/forum-topic-1",
"topic": {
"name": "Which is the best country to work in?"
},
"posts": [
{
"text": "Finland is often considered the best for it.",
"datePublished": "2020-01-30T00:00:00",
"datePublishedRaw": "Jan 30, 2020",
"upvoteCount": 12,
"replyCount": 1,
"probability": 0.95
},
{
"text": "Switzerland has good work life balance.",
"upvoteCount": 2,
"probability": 0.80
},
{
"text": "Depends on the person",
"replyCount": 1,
"probability": 0.80
}
]
},
"webPage": {
"inLanguages": [
{"code": "en"},
{"code": "es"}
]
},
"query": {
"id": "1564747029122-9e02a1868d70b7a3",
"domain": "example.com",
"userQuery": {
"pageType": "forumPosts",
"url": "https://example.com/forum-topic-1"
}
},
"algorithmVersion": "20.8.1"
}
]