Jobs API#

The jobs API makes it easy to work with your spider’s jobs and lets you schedule, stop, update and delete them.

Note

Most of the features provided by the API are also available through the python-scrapinghub client library.

run.json#

Schedules a job for a given spider.

Parameter

Description

Required

project

Project ID.

Yes

spider

Spider name.

Yes

jobq_id

Spider ID as spider in project/spider/job identifier.

No

add_tag

Add specified tag to job.

No

priority

Job priority. Supported values: 0 (lowest) to 4 (highest). Default: 2.

No

job_settings

Scrapy settings to override for the job, as a JSON object.

No

units

Amount of units to run job. Supported values: 1 to 6.

No

Note

Any other parameter will be treated as a spider argument.

Note

In case of using jobq_id parameter, spider parameter would be not required.

Method

Description

Supported parameters

POST

Schedule the specified spider.

project, spider, jobq_id, add_tag, priority, job_settings

Example that specifies a spider name:

$ curl \
    -u APIKEY: \
    https://app.scrapinghub.com/api/run.json \
    -d project=123 \
    -d spider=somespider \
    -d units=2 \
    -d add_tag=sometag \
    -d spiderarg1=example \
    -d job_settings='{"CLOSESPIDER_PAGECOUNT": "10"}'
{"status": "ok", "jobid": "123/1/1"}

Example that specifies a spider ID:

$ curl \
    -u APIKEY: \
    https://app.scrapinghub.com/api/run.json \
    -d project=123  \
    -d jobq_id=1  \
    -d units=2  \
    -d add_tag=sometag  \
    -d spiderarg1=example  \
    -d job_settings='{"CLOSESPIDER_PAGECOUNT": "10"}'
{"status": "ok", "jobid": "123/1/1"}

jobs/list.{json,jl}#

Retrieve job information for a given project, spider, or specific job.

Parameter

Description

Required

project

Project ID.

Yes

job

Job ID.

No

spider

Spider name.

No

state

Return jobs with specified state.

No

has_tag

Return jobs with specified tag.

No

lacks_tag

Return jobs that lack specified tag.

No

Supported state values: pending, running, finished, deleted.

Method

Description

Supported parameters

GET

Retrieve job information.

project, job, spider, state, has_tag, lacks_tag

Examples:

# Retrieve the latest 3 finished jobs
$ curl -u APIKEY: "https://app.scrapinghub.com/api/jobs/list.json?project=123&spider=somespider&state=finished&count=3"
{
  "status": "ok",
  "count": 3,
  "total": 3,
  "jobs": [
    {
      "responses_received": 1,
      "items_scraped": 2,
      "close_reason": "finished",
      "logs": 29,
      "tags": [],
      "spider": "somespider",
      "updated_time": "2015-11-09T15:21:06",
      "priority": 2,
      "state": "finished",
      "version": "1447064100",
      "spider_type": "manual",
      "started_time": "2015-11-09T15:20:25",
      "id": "123/45/14544",
      "errors_count": 0,
      "elapsed": 138399
    },
    {
      "responses_received": 1,
      "items_scraped": 2,
      "close_reason": "finished",
      "logs": 29,
      "tags": [
        "consumed"
      ],
      "spider": "somespider",
      "updated_time": "2015-11-09T14:21:02",
      "priority": 2,
      "state": "finished",
      "version": "1447064100",
      "spider_type": "manual",
      "started_time": "2015-11-09T14:20:25",
      "id": "123/45/14543",
      "errors_count": 0,
      "elapsed": 3433762
    },
    {
      "responses_received": 1,
      "items_scraped": 2,
      "close_reason": "finished",
      "logs": 29,
      "tags": [
        "consumed"
      ],
      "spider": "somespider",
      "updated_time": "2015-11-09T13:21:08",
      "priority": 2,
      "state": "finished",
      "version": "1447064100",
      "spider_type": "manual",
      "started_time": "2015-11-09T13:20:31",
      "id": "123/45/14542",
      "errors_count": 0,
      "elapsed": 7034158
    }
  ]
}

# Retrieve all running jobs
$ curl -u APIKEY: "https://app.scrapinghub.com/api/jobs/list.json?project=123&state=running"
{
  "status": "ok",
  "count": 2,
  "total": 2,
  "jobs": [
    {
      "responses_received": 483,
      "items_scraped": 22,
      "logs": 20,
      "tags": [],
      "spider": "somespider",
      "elapsed": 17442,
      "priority": 2,
      "state": "running",
      "version": "1447064100",
      "spider_type": "manual",
      "started_time": "2015-11-09T15:25:07",
      "id": "123/45/13140",
      "errors_count": 0,
      "updated_time": "2015-11-09T15:26:43"
    },
    {
      "responses_received": 207,
      "items_scraped": 207,
      "logs": 468,
      "tags": [],
      "spider": "someotherspider",
      "elapsed": 4085,
      "priority": 3,
      "state": "running",
      "version": "1447064100",
      "spider_type": "manual",
      "started_time": "2015-11-09T13:00:46",
      "id": "123/67/11952",
      "errors_count": 0,
      "updated_time": "2015-11-09T15:26:57"
    }
  ]
}


# Retrieve all jobs with the tag ``consumed``
$ curl -u APIKEY: "https://app.scrapinghub.com/api/jobs/list.json?project=123&lacks_tag=consumed"
{
  "status": "ok",
  "count": 3,
  "total": 3,
  "jobs": [
    {
      "responses_received": 208,
      "items_scraped": 208,
      "logs": 471,
      "tags": ["sometag"],
      "spider": "somespider",
      "elapsed": 1010,
      "priority": 3,
      "state": "running",
      "version": "1447064100",
      "spider_type": "manual",
      "started_time": "2015-11-09T13:00:46",
      "id": "123/45/11952",
      "errors_count": 0,
      "updated_time": "2015-11-09T15:28:27"
    },
    {
      "responses_received": 619,
      "items_scraped": 22,
      "close_reason": "finished",
      "logs": 29,
      "tags": ["sometag"],
      "spider": "someotherspider",
      "updated_time": "2015-11-09T15:27:20",
      "priority": 2,
      "state": "finished",
      "version": "1447064100",
      "spider_type": "manual",
      "started_time": "2015-11-09T15:25:07",
      "id": "123/67/13140",
      "errors_count": 0,
      "elapsed": 67409
    },
    {
      "responses_received": 3,
      "items_scraped": 20,
      "close_reason": "finished",
      "logs": 58,
      "tags": ["sometag", "someothertag"],
      "spider": "yetanotherspider",
      "updated_time": "2015-11-09T15:25:28",
      "priority": 2,
      "state": "finished",
      "version": "1447064100",
      "spider_type": "manual",
      "started_time": "2015-11-09T15:25:07",
      "id": "123/89/1627",
      "errors_count": 0,
      "elapsed": 179211
    }
  ]
}

jobs/update.json#

Updates information about jobs.

Parameter

Description

Required

project

Project ID.

Yes

job

Job ID.

Yes

add_tag

Add specified tag to job.

No

remove_tag

Remove specified tag from job.

No

Method

Description

Supported parameters

POST

Update job information.

project, job, add_tag, remove_tag

Example:

$ curl -u APIKEY: https://app.scrapinghub.com/api/jobs/update.json -d project=123 -d job=123/1/2 -d add_tag=consumed

jobs/delete.json#

Deletes one or more jobs.

Parameter

Description

Required

project

Project ID.

Yes

job

Job ID.

Yes

Method

Description

Supported parameters

POST

Delete job(s).

project, job

Example:

$ curl -u APIKEY: https://app.scrapinghub.com/api/jobs/delete.json -d project=123 -d job=123/1/2 -d job=123/1/3

jobs/stop.json#

Stops one running job.

Parameter

Description

Required

project

Project ID.

Yes

job

Job ID.

Yes

Method

Description

Supported parameters

POST

Stop job.

project, job

Example:

$ curl -u APIKEY: https://app.scrapinghub.com/api/jobs/stop.json -d project=123 -d job=123/1/1