JobQ API

The JobQ API allows you to retrieve finished jobs from the queue.

Note

Most of the features provided by the API are also available through the python-scrapinghub client library.

jobq/:project_id/count

Count the jobs for the specified project.

Parameter

Description

Required

spider

Filter results by spider name.

No

state

Filter results by state (pending/running/finished/deleted)

No

startts

UNIX timestamp at which to begin results, in milliseconds.

No

endts

UNIX timestamp at which to end results, in milliseconds.

No

has_tag

Filter results by existing tags

No

lacks_tag

Filter results by missing tags

No

Hint

It’s possible to repeat has_tag, lacks_tag multiple times. In this case has_tag works as an OR operation, while lacks_tag works as an AND operation.

HTTP (assuming only 2 jobs, where 1st one is marked with tagA, 2nd - with tagB):

$ curl -u APIKEY: "https://storage.scrapinghub.com/jobq/53/count"
2
$ curl -u APIKEY: "https://storage.scrapinghub.com/jobq/53/count?has_tag=tagA&has_tag=tagB"
2
$ curl -u APIKEY: "https://storage.scrapinghub.com/jobq/53/count?lacks_tag=tagA&lacks_tag=tagB"
0

Method

Description

Supported parameters

GET

Count jobs for the specified project.

spider, state, startts, endts, has_tag, lacks_tag

Examples

Count jobs for a given project

HTTP:

$ curl -u APIKEY: https://storage.scrapinghub.com/jobq/53/count
32110

jobq/:project_id/list

Lists the jobs for the specified project, in order from most recent to last.

Field

Description

ts

The time at which the job was added to the queue.

Parameter

Description

Required

spider

Filter results by spider name.

No

state

Filter results by state (pending,running,finished,deleted)

No

startts

UNIX timestamp at which to begin results, in milliseconds.

No

endts

UNIX timestamp at which to end results, in milliseconds.

No

count

Limit results by a given number of jobs

No

start

Skip N first jobs from results

No

stop

The job key at which to stop showing results.

No

key

Get job data for a given set of job keys

No

has_tag

Filter results by existing tags

No

lacks_tag

Filter results by missing tags

No

Method

Description

Supported parameters

GET

List jobs for the specified project.

startts, endts, stop

Examples

List jobs for a given project

HTTP:

$ curl -u APIKEY: https://storage.scrapinghub.com/jobq/53/list
{"key":"53/7/81","ts":1397762393489}
{"key":"53/7/80","ts":1395111612849}
{"key":"53/7/78","ts":1393972804722}
{"key":"53/7/77","ts":1393972734215}

List jobs finished between two timestamps

If you pass the startts and endts parameters, the API will return only the jobs finished between them.

HTTP:

$ curl -u APIKEY: "https://storage.scrapinghub.com/jobq/53/list?startts=1359774955431&endts=1359774955440"
{"key":"53/6/7","ts":1359774955439}
{"key":"53/3/3","ts":1359774955437}
{"key":"53/9/1","ts":1359774955431}

Retrieve jobs finished after some job

JobQ returns the list of jobs, with the most recently finished first. We recommend associating the key of the most recently finished job with the downloaded data. When you want to update your data later on, you can list the jobs and stop at the previously downloaded job, through the stop parameter.

Using HTTP:

$ curl -u APIKEY: "https://storage.scrapinghub.com/jobq/53/list?stop=53/7/81"
{"key":"53/7/83","ts":1403610146780}
{"key":"53/7/82","ts":1397827910849}