Requests API#
The requests API allows you to work with request and response data from your crawls.
Note
Most of the features provided by the API are also available through the python-scrapinghub client library.
Request object#
Field |
Description |
Required |
---|---|---|
time |
Request start timestamp in milliseconds |
Yes |
method |
HTTP method. Default: GET |
Yes |
url |
Request URL. |
Yes |
status |
HTTP response code. |
Yes |
duration |
Request duration in milliseconds. |
Yes |
rs |
Response size in bytes. |
Yes |
parent |
The index of the parent request. |
No |
fp |
Request fingerprint. |
No |
Note
Seed requests from start URLs will have no parent field.
requests/:project_id[/:spider_id][/:job_id][/:request_no]#
Retrieve or insert request data for a project, spider or job, where request_no
is the index of the request.
Parameter |
Description |
Required |
---|---|---|
format |
Results format. See Result formats. |
No |
meta |
Meta keys to show. |
No |
nodata |
If set, no data will be returned other than specified |
No |
Note
Pagination and meta parameters are supported, see Pagination and Meta parameters.
requests/:project_id/:spider_id/:job_id#
Examples#
Get the requests from a given job
HTTP:
$ curl -u APIKEY: https://storage.scrapinghub.com/requests/53/34/7
{"parent":0,"duration":12,"status":200,"method":"GET","rs":1024,"url":"http://scrapy.org/","time":1351521736957}
Adding requests
HTTP:
$ curl -u APIKEY: https://storage.scrapinghub.com/requests/53/34/7 -X POST -T requests.jl
requests/:project_id/:spider_id/:job_id/stats#
Retrieve request stats for a given job.
Field |
Description |
---|---|
counts[field] |
The number of times the field occurs. |
totals.input_bytes |
The total size of all requests in bytes. |
totals.input_values |
The total number of requests. |
Example#
HTTP:
$ curl -u APIKEY: https://storage.scrapinghub.com/requests/53/34/7/stats
{"counts":{"url":21,"parent":19,"status":21,"method":21,"rs":21,"duration":21,"fp":21},"totals":{"input_bytes":2397,"input_values":21}}