Job metadata API#
The Job metadata API allows you to get metadata for the given jobs.
Note
Most of the features provided by the API are also available through the python-scrapinghub client library.
jobs/:project_id/:spider_id/:job_id[/:field_name]#
Retrieve job data or specific meta field.
Examples#
Get metadata for the job
HTTP:
$ curl -u APIKEY: https://storage.scrapinghub.com/jobs/1/2/3
{
"close_reason": "finished",
"completed_by": "jobrunner",
"deploy_id": 1,
"finished_time": 1566311833872,
"pending_time": 1566311800654,
"priority": 2,
"project": 1,
"running_time": 1566311801163,
"scheduled_by": "testuser",
"scrapystats": {
"downloader/request_bytes": 594,
"downloader/request_count": 2,
"downloader/request_method_count/GET": 2,
"downloader/response_bytes": 1866,
"downloader/response_count": 2,
"downloader/response_status_count/200": 1,
"downloader/response_status_count/404": 1,
"elapsed_time_seconds": 3.211014,
"finish_reason": "finished",
"finish_time": 1566311822568.0,
"item_scraped_count": 1,
"log_count/DEBUG": 3,
"log_count/INFO": 11,
"log_count/WARNING": 1,
"memusage/max": 72433664,
"memusage/startup": 72433664,
"response_received_count": 2,
"robotstxt/request_count": 1,
"robotstxt/response_count": 1,
"robotstxt/response_status_count/404": 1,
"scheduler/dequeued": 1,
"scheduler/dequeued/disk": 1,
"scheduler/enqueued": 1,
"scheduler/enqueued/disk": 1,
"start_time": 1566311819357.0
},
"spider": "testspider",
"spider_args": {"arg1": "val1", "arg2": "val2"},
"spider_type": "manual",
"started_by": "jobrunner",
"state": "finished",
"tags": [
"tag1",
"tag2"
],
"units": 2,
"version": "6d32f52-master"
}
Warning
Please consider the example response with caution. Some of the fields appear only on specific conditions: for example, after finishing/deleting or restoring a job. Some other fields highly depend on the given spider/job configuration. There also might be some additional fields for internal use only which can be changed at any given moment without prior notice.
Get specific metadata field for the job
HTTP:
$ curl -u APIKEY: https://storage.scrapinghub.com/jobs/1/2/3/tags
[
"tag1",
"tag2"
]