Jobs API

The jobs API makes it easy to work with your spider’s jobs and lets you schedule, stop, update and delete them.

Note

Most of the features provided by the API are also available through the python-scrapinghub client library.

run.json

Schedules a job for a given spider.

Parameter

Description

Required

project

Project ID.

Yes

spider

Spider name.

Yes

add_tag

Add specified tag to job.

No

priority

Job priority. Supported values: 0 (lowest) to 4 (highest). Default: 2.

No

job_settings

Job settings represented as a JSON object.

No

units

Amount of units to run job. Supported values: 1 to 6.

No

Note

Any other parameter will be treated as a spider argument.

Method

Description

Supported parameters

POST

Schedule the specified spider.

project, job, spider, add_tag, priority, job_settings

Example:

$ curl -u APIKEY: https://app.scrapinghub.com/api/run.json -d project=123 -d spider=somespider -d units=2 -d add_tag=sometag -d spiderarg1=example -d job_settings='{ "setting1": "value1", "setting2": "value2" }'
{"status": "ok", "jobid": "123/1/1"}

jobs/list.{json,jl}

Retrieve job information for a given project, spider, or specific job.

Parameter

Description

Required

project

Project ID.

Yes

job

Job ID.

No

spider

Spider name.

No

state

Return jobs with specified state.

No

has_tag

Return jobs with specified tag.

No

lacks_tag

Return jobs that lack specified tag.

No

Supported state values: pending, running, finished, deleted.

Method

Description

Supported parameters

GET

Retrieve job information.

project, job, spider, state, has_tag, lacks_tag

Examples:

# Retrieve the latest 3 finished jobs
$ curl -u APIKEY: "https://app.scrapinghub.com/api/jobs/list.json?project=123&spider=somespider&state=finished&count=3"
{
  "status": "ok",
  "count": 3,
  "total": 3,
  "jobs": [
    {
      "responses_received": 1,
      "items_scraped": 2,
      "close_reason": "finished",
      "logs": 29,
      "tags": [],
      "spider": "somespider",
      "updated_time": "2015-11-09T15:21:06",
      "priority": 2,
      "state": "finished",
      "version": "1447064100",
      "spider_type": "manual",
      "started_time": "2015-11-09T15:20:25",
      "id": "123/45/14544",
      "errors_count": 0,
      "elapsed": 138399
    },
    {
      "responses_received": 1,
      "items_scraped": 2,
      "close_reason": "finished",
      "logs": 29,
      "tags": [
        "consumed"
      ],
      "spider": "somespider",
      "updated_time": "2015-11-09T14:21:02",
      "priority": 2,
      "state": "finished",
      "version": "1447064100",
      "spider_type": "manual",
      "started_time": "2015-11-09T14:20:25",
      "id": "123/45/14543",
      "errors_count": 0,
      "elapsed": 3433762
    },
    {
      "responses_received": 1,
      "items_scraped": 2,
      "close_reason": "finished",
      "logs": 29,
      "tags": [
        "consumed"
      ],
      "spider": "somespider",
      "updated_time": "2015-11-09T13:21:08",
      "priority": 2,
      "state": "finished",
      "version": "1447064100",
      "spider_type": "manual",
      "started_time": "2015-11-09T13:20:31",
      "id": "123/45/14542",
      "errors_count": 0,
      "elapsed": 7034158
    }
  ]
}

# Retrieve all running jobs
$ curl -u APIKEY: "https://app.scrapinghub.com/api/jobs/list.json?project=123&state=running"
{
  "status": "ok",
  "count": 2,
  "total": 2,
  "jobs": [
    {
      "responses_received": 483,
      "items_scraped": 22,
      "logs": 20,
      "tags": [],
      "spider": "somespider",
      "elapsed": 17442,
      "priority": 2,
      "state": "running",
      "version": "1447064100",
      "spider_type": "manual",
      "started_time": "2015-11-09T15:25:07",
      "id": "123/45/13140",
      "errors_count": 0,
      "updated_time": "2015-11-09T15:26:43"
    },
    {
      "responses_received": 207,
      "items_scraped": 207,
      "logs": 468,
      "tags": [],
      "spider": "someotherspider",
      "elapsed": 4085,
      "priority": 3,
      "state": "running",
      "version": "1447064100",
      "spider_type": "manual",
      "started_time": "2015-11-09T13:00:46",
      "id": "123/67/11952",
      "errors_count": 0,
      "updated_time": "2015-11-09T15:26:57"
    }
  ]
}


# Retrieve all jobs with the tag ``consumed``
$ curl -u APIKEY: "https://app.scrapinghub.com/api/jobs/list.json?project=123&lacks_tag=consumed"
{
  "status": "ok",
  "count": 3,
  "total": 3,
  "jobs": [
    {
      "responses_received": 208,
      "items_scraped": 208,
      "logs": 471,
      "tags": ["sometag"],
      "spider": "somespider",
      "elapsed": 1010,
      "priority": 3,
      "state": "running",
      "version": "1447064100",
      "spider_type": "manual",
      "started_time": "2015-11-09T13:00:46",
      "id": "123/45/11952",
      "errors_count": 0,
      "updated_time": "2015-11-09T15:28:27"
    },
    {
      "responses_received": 619,
      "items_scraped": 22,
      "close_reason": "finished",
      "logs": 29,
      "tags": ["sometag"],
      "spider": "someotherspider",
      "updated_time": "2015-11-09T15:27:20",
      "priority": 2,
      "state": "finished",
      "version": "1447064100",
      "spider_type": "manual",
      "started_time": "2015-11-09T15:25:07",
      "id": "123/67/13140",
      "errors_count": 0,
      "elapsed": 67409
    },
    {
      "responses_received": 3,
      "items_scraped": 20,
      "close_reason": "finished",
      "logs": 58,
      "tags": ["sometag", "someothertag"],
      "spider": "yetanotherspider",
      "updated_time": "2015-11-09T15:25:28",
      "priority": 2,
      "state": "finished",
      "version": "1447064100",
      "spider_type": "manual",
      "started_time": "2015-11-09T15:25:07",
      "id": "123/89/1627",
      "errors_count": 0,
      "elapsed": 179211
    }
  ]
}

jobs/update.json

Updates information about jobs.

Parameter

Description

Required

project

Project ID.

Yes

job

Job ID.

Yes

add_tag

Add specified tag to job.

No

remove_tag

Remove specified tag from job.

No

Method

Description

Supported parameters

POST

Update job information.

project, job, add_tag, remove_tag

Example:

$ curl -u APIKEY: https://app.scrapinghub.com/api/jobs/update.json -d project=123 -d job=123/1/2 -d add_tag=consumed

jobs/delete.json

Deletes one or more jobs.

Parameter

Description

Required

project

Project ID.

Yes

job

Job ID.

Yes

Method

Description

Supported parameters

POST

Delete job(s).

project, job

Example:

$ curl -u APIKEY: https://app.scrapinghub.com/api/jobs/delete.json -d project=123 -d job=123/1/2 -d job=123/1/3

jobs/stop.json

Stops one running job.

Parameter

Description

Required

project

Project ID.

Yes

job

Job ID.

Yes

Method

Description

Supported parameters

POST

Stop job.

project, job

Example:

$ curl -u APIKEY: https://app.scrapinghub.com/api/jobs/stop.json -d project=123 -d job=123/1/1