Real Estate Extraction#
Real estate extraction supports pages which contain a single real estate listing for sale or for rent. Many fields are extracted, such as real estate name, address, area, images and price.
Related page type is Product List Extraction which supports pages with multiple products.
Request example#
If you requested a real estate extraction, and the extraction succeeds,
then the realEstate
field will be available in the query result:
from autoextract.sync import request_raw
query = [{
'url': 'http://example.com/example-real-estate-page',
'pageType': 'realEstate'
}]
results = request_raw(query, api_key='[api key]')
print(results[0]['realEstate'])
Available fields#
The following fields are available for real estates:
name
: stringThe name of the real estate.
datePublished
: stringPublication date. ISO-formatted with ‘T’ separator, may contain a timezone.
datePublishedRaw
: stringSame date as
datePublished
but before parsing, i.e. as it appears on the website.description
: stringDescription of the real estate.
mainImage
: stringA URL or data URL value of the main image of the real estate.
images
: List of stringsA list of URL or data URL values of all images of the real estate (may include the main image).
yearBuilt
: integerThe year a real estate was constructed. Example:
2008
.breadcrumbs
: list of dictionaries withname
andlink
optional string fieldsA list of breadcrumbs (a specific navigation element) with optional name and URL. Example:
[ {"name": "Foo", "link": "http://example.com/foo"}, {"name": "Bar", "link": "http://example.com/foo/bar"}, {"name": "Baz"}, ]
additionalProperty
: list of dictionaries withname
andvalue
fieldsA list of real estate properties or characteristics.
name
field contains the property namevalue
field contains the property value.
Example:
[ {"name": "location", "value": "Tivat"}, {"name": "region", "value": "Tivat and Lustica"}, {"name": "type", "value": "Apartments / Developments"}, {"name": "size", "value": "41m2"} ]
address
: dictionaryA structured postal address of the real estate. Fields:
postalCode
- postal code of the addressstreetAddress
- street addressaddressCountry
- country name or a two-letter ISO 3166-1 alpha-2 country codeaddressLocality
- locality in which the street address is, and which is in the regionaddressRegion
- region in which the locality is, and which is in the countryraw
- complete address information, as it appears on the website
Example:
{ "postalCode": "77701", "streetAddress": "3214 Brookview Drive", "addressCountry": "US", "addressLocality": "Beaumont", "addressRegion": "Texas", "raw": "3214 Brookview Drive, Beaumont, Texas 77701, US" }
All fields are optional.
area
: dictionaryA structured area of the real estate. Fields:
value
- a float number with the area of the real estateunitCode
- unit of the area. Allowed values:SQMT
- square meterSQFT
- square footACRE
- acre
raw
- area in the raw format, as it appears on the website
Example:
{ "value": 54.0, "unitCode": "SQMT", "raw": "54 m²" }
Fields
value
andunitCode
are optional.numberOfBathroomsTotal
: integerThe total number of bathrooms in the real estate.
numberOfFullBathrooms
: integerThe number of full bathrooms in the real estate.
numberOfPartialBathrooms
: integerThe number of half bathrooms in the real estate.
numberOfBedrooms
: integerThe number of bedrooms in the real estate.
numberOfRooms
: integerThe number of rooms (excluding bathrooms and closets) of the real estate.
identifier
: stringThe identifier of the real estate.
tradeActions
: list of dictionariesA list of structures describing possible trade actions that can be done on the real estate.
Each dictionary in a list can have the following fields:
tradeType
- type of a trade action, a string. Allowed values:"BuyAction"
- the real estate is for sale"RentAction"
- the real estate is for rent
price
- a string with an offer price of the real estatecurrency
- currency of the price, a string
Example:
[ { "tradeType": "RentAction", "price": "1700.0", "currency": "USD" } ]
All fields are optional, but
currency
can be present only ifprice
is also present.probability
: floatProbability that the requested page is a single real estate page.
url
: stringURL a of page where this real estate was extracted.
All fields are optional, except for url
and probability
.
Fields without a valid value (null or empty array) are excluded from extraction results.
Response example#
Below is an example response with all real estate fields present:
[
{
"realEstate": {
"name": "Real Estate name",
"datePublished": "2020-06-18T00:00:00",
"datePublishedRaw": "June 18, 2020",
"description": "Real Estate description",
"mainImage": "http://example.com/image.png",
"images": [
"http://example.com/image.png"
],
"yearBuilt": 2018,
"breadcrumbs": [
{
"name": "Level 1",
"link": "http://example.com"
}
],
"additionalProperty": [
{
"name": "property 1",
"value": "value of property 1"
}
],
"address": {
"postalCode": "77701",
"streetAddress": "3214 Brookview Drive",
"addressCountry": "US",
"addressLocality": "Beaumont",
"addressRegion": "Texas",
"raw": "3214 Brookview Drive, Beaumont, Texas 77701, US"
},
"area": {
"value": 54.0,
"unitCode": "SQMT",
"raw": "54 m²"
},
"numberOfBathroomsTotal": 3,
"numberOfFullBathrooms": 2,
"numberOfPartialBathrooms": 1,
"numberOfBedrooms": 2,
"numberOfRooms": 3,
"identifier": "XYZ",
"tradeActions": [
{
"tradeType": "RentAction",
"price": "1700.0",
"currency": "USD"
}
],
"probability": 0.95,
"url": "http://example.com/example-real-estate-page"
},
"webPage": {
"inLanguages": [
{"code": "en"},
{"code": "es"}
]
},
"query": {
"id": "1564747029122-9e02a1868d70b7a3",
"domain": "example.com",
"userQuery": {
"pageType": "realEstate",
"url": "http://example.com/example-real-estate-page"
}
},
"algorithmVersion": "20.8.1"
}
]