Forum Post Extraction (beta)¶
Forum Post refers to the posts made on internet forum page where a specific topic is discussed (thread).
Request example¶
If you requested a forum post extraction, and the extraction succeeds,
then the forumPosts
field will be available in the query result:
from autoextract.sync import request_raw
query = [{
'url': 'https://example.com/forum-post-page',
'pageType': 'forumPosts'
}]
results = request_raw(query, api_key='[api key]')
print(results[0]['forumPosts'])
Available fields¶
Top-level¶
The following fields are available for forumPosts
:
url
: stringURL of a page where posts were extracted.
topic
: dictionary withname
string fieldA dictionary with the name of the topic that is discussed on the page. Example:
{"name": "How do you cook rice?"}
posts
: list of dictionariesList of posts; fields are described below.
url
field is required.
Individual posts¶
Each post inside posts
field has the following fields available:
text
: stringText of the post.
datePublished
: stringPost date. ISO-formatted with ‘T’ separator, may contain a timezone.
datePublishedRaw
: stringSame as
datePublished
, but before parsing/normalization, i.e. as it appeared on the site.upvoteCount
: integerNumber of up-votes recieved by the post.
downvoteCount
: integerNumber of down-votes recieved by the post.
probability
: floatProbability that this is a post.
Posts refer to the topic
extracted from the same page.
All fields are optional, except for probability
.
Fields without a valid value (null or empty array) are excluded from extraction results.
Response example¶
Below is an example response with all forum post fields present:
[
{
"forumPosts": {
"url": "https://example.com/forum-topic-1",
"topic": {
"name": "Which is the best country to work in?"
},
"posts": [
{
"text": "Finland is often considered the best for it.",
"datePublished": "2020-01-30T00:00:00",
"datePublishedRaw": "Jan 30, 2020",
"upvoteCount": 12,
"replyCount": 1,
"probability": 0.95
},
{
"text": "Switzerland has good work life balance.",
"upvoteCount": 2,
"probability": 0.80
},
{
"text": "Depends on the person",
"replyCount": 1,
"probability": 0.80
}
]
},
"webPage": {
"inLanguages": [
{"code": "en"},
{"code": "es"}
]
},
"query": {
"id": "1564747029122-9e02a1868d70b7a3",
"domain": "example.com",
"userQuery": {
"pageType": "forumPosts",
"url": "https://example.com/forum-topic-1"
}
},
"algorithmVersion": "20.8.1"
}
]