Warning

Zyte Automatic Extraction will be discontinued starting April 30th, 2024. It is replaced by Zyte API. See Migrating from Automatic Extraction to Zyte API.

Vehicle Extraction#

Vehicle extraction supports pages which contain a single vehicle listing for sale. Many fields are extracted, such as model name, price, and vehicle-specific fields such as VIN, mileage, engine and fuel type, and others.

Related page type is Product List Extraction which supports pages with multiple products.

Request example#

If you requested vehicle extraction, and the extraction succeeds, then the vehicle field will be available in the query result:

from autoextract.sync import request_raw

query = [{
    'url': 'https://example.com/vehicle',
    'pageType': 'vehicle'
}]
results = request_raw(query, api_key='[api key]')
print(results[0]['vehicle'])

Available fields#

Vehicle is a sub-class of product, so it has some of the fields of product and some fields which are specific to vehicle only.

Vehicle-specific fields#

The list of fields which are specific to vehicle:

vehicleIdentificationNumber: string

VIN number is an unique fingerprint for vehicle, which is different for every vehicle.

mileageFromOdometer: dictionary

A dictionary with the mileage of the vehicle. It may contain two fields:

  • value is an integer indicating the distance travelled by the vehicle

  • unitCode is a string with an unit code, can be one of

    • SMI for miles

    • KMT for kilometers

Example:

{"value": 43000, "unitCode": "KMT"}
vehicleTransmission: string

Vehicle transmission. It is the type of component used for transmitting the power from a rotating power source to the wheels or other relevant component.

fuelType: string

The type of fuel suitable for the engine of the vehicle.

vehicleEngine: dictionary

Information about the engine or engines of the vehicle. Currently it is a dictionary with a "raw" string field. This field contains the raw text present on the site without any parsing. Example:

{"raw": "4.4L"}
color: string

The color of car (exterior).

vehicleInteriorColor: string

The color of car interior.

availableAtOrFrom: dictionary

The place where the car is located. Currently it is a dictionary with a "raw" string field. This field contains the raw text present on the site without any parsing. Example:

{"raw": "New york"}
numberOfDoors: integer

The number of doors in the car.

vehicleSeatingCapacity: integer

Seating capacity of the car.

fuelEfficiency: list of dictionaries

The measure of fuel efficiency of vehicle. It can be represented as distance per unit fuel (eg. 20 miles per gallon) or fuel per unit distance (8 liters per 100 km). Field raw indicate the raw text present on the site without any parsing.

Example:

[
    {"raw": "25 mpg (city)"},
    {"raw": "40 mpg (highway)"}
]

General product fields#

Many fields from product (see Product Extraction) are also extracted from vehicles:

name: string

The name of the vehicle.

offers: list of dictionaries

Vehicle offers. Each offer may contain price, currency, regularPrice and availability string fields. All fields are optional but currency is present only if price is also present.

  • price field is a string with a valid number (a dot is used as decimal separator). It is the price a customer has to pay after discounts or special offers.

  • currency is the currency as given on the website, without extra normalization (for example, both “$” and “USD” are possible currencies). It is present only if price is also present.

  • regularPrice is the price before any discount or special offer. It is present only when the price is different from regularPrice.

  • availability is the product availability, as a string. Allowed values:

    • "InStock" - includes limited availability, presale, preorder, and in-store only.

    • "OutOfStock" - includes discontinued and sold out.

Example:

[
  {
    "price": "42000",
    "regularPrice": "45000.00",
    "currency": "USD",
    "availability": "InStock"
  }
]
sku: string

Stock Keeping Unit identifier for the vehicle assigned by the seller.

mpn: string

Manufacturer part number identifier for vehicle. It is issued by the manufacturer and is same across different websites for a vehicle.

brand: string

Brand or manufacturer of the vehicle.

breadcrumbs: list of dictionaries with name and link optional string fields

A list of breadcrumbs (a specific navigation element) with optional name and URL. Example:

[
  {"name": "Foo", "link": "http://example.com/foo"},
  {"name": "Bar", "link": "http://example.com/foo/bar"},
  {"name": "Baz"},
]
mainImage: string

A URL or data URL value of the main image of the vehicle.

images: list of strings

A list of URL or data URL values of all images of the vehicle (may include the main image).

description: string

Description of the vehicle.

descriptionHtml: string

Simplified HTML of the description, including sub-headings, image captions and embedded content.

aggregateRating: dictionary

Aggregate information about the vehicle rating and reviews.

  • ratingValue is the average rating value, as a float.

  • bestRating is the best possible rating value, as a float.

  • reviewCount is the number of reviews or ratings for the product, as int.

Example - 4.5 out of 5, based on 12 reviews:

{
  "ratingValue": 4.5,
  "bestRating": 5,
  "reviewCount": 12
}

All fields are optional but one of reviewCount or ratingValue must be present.

additionalProperty: list of dictionaries with name and value fields

A list of vehicle properties or characteristics.

  • name field contains the property name,

  • value field contains the property value.

Example:

[
    {"name": "engine", "value": "I4"},
    {"name": "drivetrain", "value": "All-Wheel Drive"},
    {"name": "fuel type", "value": "Gasoline"}
]
probability: float

Probability that the requested page is a single vehicle page.

canonicalUrl: string

Canonical URL of the vehicle, if available.

url: string

URL a of page where this vehicle was extracted.

All fields are optional, except for url and probability. Fields without a valid value (null or empty array) are excluded from extraction results.

Response example#

Below is an example response with all vehicle fields present:

[
  {
    "vehicle": {
      "name": "Vehicle name",
      "offers": [
        {
          "price": "42000",
          "currency": "USD",
          "availability": "InStock",
          "regularPrice": "48000"
        }
      ],
      "sku": "Vehicle sku",
      "mpn": "Vehicle model",
      "vehicleIdentificationNumber": "4T1BE32K25U056382",
      "mileageFromOdometer": {
        "value": 25000,
        "unitCode": "KMT"
      },
      "vehicleTransmission": "manual",
      "fuelType": "Petrol",
      "vehicleEngine": {
        "raw": "4.4L "
      },
      "availableAtOrFrom": {
        "raw": "New york"
      },
      "color": "black",
      "vehicleInteriorColor": "Silver",
      "numberOfDoors": 5,
      "vehicleSeatingCapacity": 6,
      "fuelEfficiency": [
        {
          "raw": "45 mpg (city)"
        }
      ],
      "brand": "vehicle brand",
      "breadcrumbs": [
        {
          "name": "Level 1",
          "link": "http://example.com"
        }
      ],
      "mainImage": "http://example.com/image.png",
      "images": [
        "http://example.com/image.png"
      ],
      "description": "vehicle description",
      "descriptionHtml": "<article>HTML description for Vehicle ...",
      "aggregateRating": {
        "ratingValue": 4.5,
        "bestRating": 5.0,
        "reviewCount": 31
      },
      "additionalProperty": [
        {
          "name": "property 1",
          "value": "value of property 1"
        }
      ],
      "probability": 0.95,
      "canonicalUrl": "https://example.com/vehicle/",
      "url": "https://example.com/vehicle"
    },
    "webPage": {
      "inLanguages": [
        {"code": "en"},
        {"code": "es"}
      ]
    },
    "query": {
      "id": "1564747029122-9e02a1868d70b7a2",
      "domain": "example.com",
      "userQuery": {
        "pageType": "vehicle",
        "url": "https://example.com/vehicle"
      }
    },
    "algorithmVersion": "20.8.1"
  }
]