Real Estate Extraction (beta)

Real estate extraction supports pages which contain a single real estate listing for sale or for rent. Many fields are extracted, such as real estate name, address, area, images and price.

Related page type is Product List Extraction which supports pages with multiple products.

Request example

If you requested a real estate extraction, and the extraction succeeds, then the realEstate field will be available in the query result:

from autoextract.sync import request_raw

query = [{
    'url': 'http://example.com/example-real-estate-page',
    'pageType': 'realEstate'
}]
results = request_raw(query, api_key='[api key]')
print(results[0]['realEstate'])

Available fields

The following fields are available for real estates:

name: string

The name of the real estate.

datePublished: string

Publication date. ISO-formatted with ‘T’ separator, may contain a timezone.

datePublishedRaw: string

Same date as datePublished but before parsing, i.e. as it appears on the website.

description: string

Description of the real estate.

mainImage: string

A URL or data URL value of the main image of the real estate.

images: List of strings

A list of URL or data URL values of all images of the real estate (may include the main image).

yearBuilt: number

The year a real estate was constructed. Example: 2008.

breadcrumbs: list of dictionaries with name and link optional string fields

A list of breadcrumbs (a specific navigation element) with optional name and URL. Example:

[
  {"name": "Foo", "link": "http://example.com/foo"},
  {"name": "Bar", "link": "http://example.com/foo/bar"},
  {"name": "Baz"},
]
additionalProperty: list of dictionaries with name and value fields

A list of real estate properties or characteristics.

  • name field contains the property name

  • value field contains the property value.

Example:

[
    {"name": "location", "value": "Tivat"},
    {"name": "region", "value": "Tivat and Lustica"},
    {"name": "type", "value": "Apartments / Developments"},
    {"name": "size", "value": "41m2"}
]
address: dictionary

A structured postal address of the real estate. Fields:

  • postalCode - postal code of the address

  • streetAddress - street address

  • addressCountry - country name or a two-letter ISO 3166-1 alpha-2 country code

  • addressLocality - locality in which the street address is, and which is in the region

  • addressRegion - region in which the locality is, and which is in the country

  • raw - complete address information, as it appears on the website

Example:

{
    "postalCode": "77701",
    "streetAddress": "3214 Brookview Drive",
    "addressCountry": "US",
    "addressLocality": "Beaumont",
    "addressRegion": "Texas",
    "raw": "3214 Brookview Drive, Beaumont, Texas 77701, US"
}

All fields are optional.

area: dictionary

A structured area of the real estate. Fields:

  • value - a number with the area of the real estate

  • unitCode - unit of the area. Allowed values:

    • SQMT - square meter

    • SQFT - square foot

    • ACRE - acre

  • raw - area in the raw format, as it appears on the website

Example:

{
    "value": 54.0,
    "unitCode": "SQMT",
    "raw": "54 m²"
}

Fields value and unitCode are optional.

numberOfBathroomsTotal: number

The total number of bathrooms in the real estate.

numberOfFullBathrooms: number

The number of full bathrooms in the real estate.

numberOfPartialBathrooms: number

The number of half bathrooms in the real estate.

numberOfBedrooms: number

The number of bedrooms in the real estate.

numberOfRooms: number

The number of rooms (excluding bathrooms and closets) of the real estate.

identifier: string

The identifier of the real estate.

tradeActions: list of dictionaries

A list of structures describing possible trade actions that can be done on the real estate.

Each dictionary in a list can have the following fields:

  • tradeType - type of a trade action, a string. Allowed values:

    • "BuyAction" - the real estate is for sale

    • "RentAction" - the real estate is for rent

  • price - a string with an offer price of the real estate

  • currency - currency of the price, a string

Example:

[
    {
        "tradeType": "RentAction",
        "price": "1700.0",
        "currency": "USD"
    }
]

All fields are optional, but currency can be present only if price is also present.

probability: float

Probability that the requested page is a single real estate page.

url: string

URL a of page where this real estate was extracted.

All fields are optional, except for url and probability. Fields without a valid value (null or empty array) are excluded from extraction results.

Response example

Below is an example response with all real estate fields present:

[
  {
    "realEstate": {
      "name": "Real Estate name",
      "datePublished": "2020-06-18T00:00:00",
      "datePublishedRaw": "June 18, 2020",
      "description": "Real Estate description",
      "mainImage": "http://example.com/image.png",
      "images": [
        "http://example.com/image.png"
      ],
      "yearBuilt": 2018,
      "breadcrumbs": [
        {
          "name": "Level 1",
          "link": "http://example.com"
        }
      ],
      "additionalProperty": [
        {
          "name": "property 1",
          "value": "value of property 1"
        }
      ],
      "address": {
        "postalCode": "77701",
        "streetAddress": "3214 Brookview Drive",
        "addressCountry": "US",
        "addressLocality": "Beaumont",
        "addressRegion": "Texas",
        "raw": "3214 Brookview Drive, Beaumont, Texas 77701, US"
      },
      "area": {
        "value": 54.0,
        "unitCode": "SQMT",
        "raw": "54 m²"
      },
      "numberOfBathroomsTotal": 3,
      "numberOfFullBathrooms": 2,
      "numberOfPartialBathrooms": 1,
      "numberOfBedrooms": 2,
      "numberOfRooms": 3,
      "identifier": "XYZ",
      "tradeActions": [
        {
          "tradeType": "RentAction",
          "price": "1700.0",
          "currency": "USD"
        }
      ],
      "probability": 0.95,
      "url": "http://example.com/example-real-estate-page"
    },
    "webPage": {
      "inLanguages": [
        {"code": "en"},
        {"code": "es"}
      ]
    },
    "query": {
      "id": "1564747029122-9e02a1868d70b7a3",
      "domain": "example.com",
      "userQuery": {
        "pageType": "realEstate",
        "url": "http://example.com/example-real-estate-page"
      }
    },
    "algorithmVersion": "20.8.1"
  }
]