JobQ API#

The JobQ API allows you to retrieve finished jobs from the queue.

Note

Most of the features provided by the API are also available through the python-scrapinghub client library.

jobq/:project_id/count#

Count the jobs for the specified project.

Parameter	Description	Required
spider	Filter results by spider name.	No
state	Filter results by state (pending/running/finished/deleted)	No
startts	UNIX timestamp at which to begin results, in milliseconds.	No
endts	UNIX timestamp at which to end results, in milliseconds.	No
has_tag	Filter results by existing tags	No
lacks_tag	Filter results by missing tags	No

Hint

It’s possible to repeat has_tag, lacks_tag multiple times. In this case has_tag works as an OR operation, while lacks_tag works as an AND operation.

HTTP (assuming only 2 jobs, where 1st one is marked with tagA, 2nd - with tagB):

$ curl -u YOUR_SCRAPY_CLOUD_API_KEY: "https://jobq.zyte.com/jobq/53/count"
2
$ curl -u YOUR_SCRAPY_CLOUD_API_KEY: "https://jobq.zyte.com/jobq/53/count?has_tag=tagA&has_tag=tagB"
2
$ curl -u YOUR_SCRAPY_CLOUD_API_KEY: "https://jobq.zyte.com/jobq/53/count?lacks_tag=tagA&lacks_tag=tagB"
0

Method	Description	Supported parameters
GET	Count jobs for the specified project.	spider, state, startts, endts, has_tag, lacks_tag

Examples#

Count jobs for a given project

HTTP:

$ curl -u YOUR_SCRAPY_CLOUD_API_KEY: https://jobq.zyte.com/jobq/53/count
32110

jobq/:project_id/list#

Lists the jobs for the specified project, in order from most recent to last.

Field	Description
ts	The time at which the job was added to the queue.

Parameter	Description	Required
spider	Filter results by spider name.	No
state	Filter results by state (pending,running,finished,deleted)	No
startts	UNIX timestamp at which to begin results, in milliseconds.	No
endts	UNIX timestamp at which to end results, in milliseconds.	No
count	Limit results by a given number of jobs	No
start	Skip N first jobs from results	No
stop	The job key at which to stop showing results.	No
key	Get job data for a given set of job keys	No
has_tag	Filter results by existing tags	No
lacks_tag	Filter results by missing tags	No

Method	Description	Supported parameters
GET	List jobs for the specified project.	startts, endts, stop

Examples#

List jobs for a given project

HTTP:

$ curl -u YOUR_SCRAPY_CLOUD_API_KEY: https://jobq.zyte.com/jobq/53/list
{"key":"53/7/81","ts":1397762393489}
{"key":"53/7/80","ts":1395111612849}
{"key":"53/7/78","ts":1393972804722}
{"key":"53/7/77","ts":1393972734215}

List jobs finished between two timestamps

If you pass the startts and endts parameters, the API will return only the jobs finished between them.

HTTP:

$ curl -u YOUR_SCRAPY_CLOUD_API_KEY: "https://jobq.zyte.com/jobq/53/list?startts=1359774955431&endts=1359774955440"
{"key":"53/6/7","ts":1359774955439}
{"key":"53/3/3","ts":1359774955437}
{"key":"53/9/1","ts":1359774955431}

Retrieve jobs finished after some job

JobQ returns the list of jobs, with the most recently finished first. We recommend associating the key of the most recently finished job with the downloaded data. When you want to update your data later on, you can list the jobs and stop at the previously downloaded job, through the stop parameter.

Using HTTP:

$ curl -u YOUR_SCRAPY_CLOUD_API_KEY: "https://jobq.zyte.com/jobq/53/list?stop=53/7/81"
{"key":"53/7/83","ts":1403610146780}
{"key":"53/7/82","ts":1397827910849}