Code examples#
The Zyte API documentation features code examples for many different technologies.
You can find those examples at the end of relevant topics in pages like Zyte API HTTP requests, Zyte API browser automation, Zyte API automatic extraction or Zyte API shared features, or find them all below.
Tip
The right-hand sidebar of the Zyte API reference contains additional examples of Zyte API parameters.
Requirements#
Select a technology tab below to learn how to install and configure the requirements to run code examples for that technology:
C# code examples use C# 9.0.
To run C# code examples, install:
.NET SDK 5.x or later
Html Agility Pack, for HTML parsing
CLI client code examples feature the command-line interface of python-zyte-api, the official Python client of Zyte API, along with other command-line tools.
To run CLI client code examples, install:
python-zyte-api, for requests.
Requires installing Python first.
jq, for JSON parsing.
base64, for base64 encoding and decoding.
On Windows, you can use chocolatey to install GNU Core Utilities, which includes a
base64
command-line application.macOS comes with a
base64
command-line application pre-installed.Most Linux distributions come with GNU Core Utilities pre-installed, or make it easy to install it. GNU Core Utilities includes a
base64
command-line application.
xmllint, for HTML parsing.
On Windows, install libxml2, which provides
xmllint
.macOS comes with
xmllint
pre-installed.Most Linux distributions make it easy to install libxml2, which provides
xmllint
.
xargs, for parallelization.
On Windows, you can use chocolatey to install GNU findutils, which includes a
xargs
command-line application.macOS comes with a
xargs
command-line application pre-installed.Most Linux distributions come with GNU findutils pre-installed, or make it easy to install it. GNU findutils includes a
xargs
command-line application.
curl code examples feature curl and other command-line tools.
To run curl code examples, install:
curl, for requests.
Note
curl comes pre-installed in many operating systems.
jq, for JSON parsing.
base64, for base64 encoding and decoding.
On Windows, you can use chocolatey to install GNU Core Utilities, which includes a
base64
command-line application.macOS comes with a
base64
command-line application pre-installed.Most Linux distributions come with GNU Core Utilities pre-installed, or make it easy to install it. GNU Core Utilities includes a
base64
command-line application.
xmllint, for HTML parsing.
On Windows, install libxml2, which provides
xmllint
.macOS comes with
xmllint
pre-installed.Most Linux distributions make it easy to install libxml2, which provides
xmllint
.
xargs, for parallelization.
On Windows, you can use chocolatey to install GNU findutils, which includes a
xargs
command-line application.macOS comes with a
xargs
command-line application pre-installed.Most Linux distributions come with GNU findutils pre-installed, or make it easy to install it. GNU findutils includes a
xargs
command-line application.
Java code examples use Java SE 8.
To run Java code examples, install:
JS code examples use JavaScript.
To run JS code examples, install:
axios, for requests.
cheerio, for HTML parsing.
https-proxy-agent, for proxy mode.
PHP code examples use PHP 7.4.
To run PHP code examples, install:
Proxy mode code examples use curl with Zyte API as a proxy. See the curl tab for code example requirement details.
See Zyte API proxy mode to learn how to use Zyte API as a proxy with other technologies.
Python code examples use Python 3.
To run Python code examples, install:
Python client code examples feature the asyncio API of python-zyte-api, the official Python client of Zyte API.
To run Python client code examples, install:
python-zyte-api, for requests.
Parsel for HTML parsing.
Ruby code examples use Ruby 3.x.
Scrapy code examples feature Scrapy with the scrapy-zyte-api plugin configured in transparent mode.
To run Scrapy code examples, install:
After installing scrapy-zyte-api, you must also configure it in
your Scrapy project. If you configure it
enabling its components separately instead of enabling the add-on, you
also need to set ZYTE_API_TRANSPARENT_MODE
to True
.
Tip
The web scraping tutorial covers installing and configuring the requirements for Scrapy code examples.
All examples#
Running the scrollBottom
action
Note
Install and configure code example requirements and the Zyte CA certificate to run the example below.
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
using HtmlAgilityPack;
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.All
};
HttpClient client = new HttpClient(handler);
var apiKey = "YOUR_API_KEY";
var bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(apiKey + ":");
var auth = System.Convert.ToBase64String(bytes);
client.DefaultRequestHeaders.Add("Authorization", "Basic " + auth);
client.DefaultRequestHeaders.Add("Accept-Encoding", "br, gzip, deflate");
var input = new Dictionary<string, object>(){
{"url", "https://quotes.toscrape.com/scroll"},
{"browserHtml", true},
{
"actions",
new List<Dictionary<string, object>>()
{
new Dictionary<string, object>()
{
{"action", "scrollBottom"}
}
}
}
};
var inputJson = JsonSerializer.Serialize(input);
var content = new StringContent(inputJson, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("https://api.zyte.com/v1/extract", content);
var body = await response.Content.ReadAsByteArrayAsync();
var data = JsonDocument.Parse(body);
var browserHtml = data.RootElement.GetProperty("browserHtml").ToString();
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(browserHtml);
var navigator = htmlDocument.CreateNavigator();
var quoteCount = (double)navigator.Evaluate("count(//*[@class=\"quote\"])");
{"url": "https://quotes.toscrape.com/scroll", "browserHtml": true, "actions": [{"action": "scrollBottom"}]}
zyte-api input.jsonl \
| jq --raw-output .browserHtml \
| xmllint --html --xpath 'count(//*[@class="quote"])' - 2> /dev/null
{
"url": "https://quotes.toscrape.com/scroll",
"browserHtml": true,
"actions": [
{
"action": "scrollBottom"
}
]
}
curl \
--user YOUR_API_KEY: \
--header 'Content-Type: application/json' \
--data @input.json \
--compressed \
https://api.zyte.com/v1/extract \
| jq --raw-output .browserHtml \
| xmllint --html --xpath 'count(//*[@class="quote"])' - 2> /dev/null
import com.google.common.collect.ImmutableMap;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Collections;
import java.util.Map;
import org.apache.hc.client5.http.classic.methods.HttpPost;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.ContentType;
import org.apache.hc.core5.http.HttpEntity;
import org.apache.hc.core5.http.HttpHeaders;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.io.entity.EntityUtils;
import org.apache.hc.core5.http.io.entity.StringEntity;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
class Example {
private static final String API_KEY = "YOUR_API_KEY";
public static void main(final String[] args)
throws InterruptedException, IOException, ParseException {
Map<String, Object> action = ImmutableMap.of("action", "scrollBottom");
Map<String, Object> parameters =
ImmutableMap.of(
"url",
"https://quotes.toscrape.com/scroll",
"browserHtml",
true,
"actions",
Collections.singletonList(action));
String requestBody = new Gson().toJson(parameters);
HttpPost request = new HttpPost("https://api.zyte.com/v1/extract");
request.setHeader(HttpHeaders.CONTENT_TYPE, ContentType.APPLICATION_JSON);
request.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
request.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
request.setEntity(new StringEntity(requestBody));
CloseableHttpClient client = HttpClients.createDefault();
client.execute(
request,
response -> {
HttpEntity entity = response.getEntity();
String apiResponse = EntityUtils.toString(entity, StandardCharsets.UTF_8);
JsonObject jsonObject = JsonParser.parseString(apiResponse).getAsJsonObject();
String browserHtml = jsonObject.get("browserHtml").getAsString();
Document document = Jsoup.parse(browserHtml);
int quoteCount = document.select(".quote").size();
System.out.println(quoteCount);
return null;
});
}
private static String buildAuthHeader() {
String auth = API_KEY + ":";
String encodedAuth = Base64.getEncoder().encodeToString(auth.getBytes());
return "Basic " + encodedAuth;
}
}
const axios = require('axios')
const cheerio = require('cheerio')
axios.post(
'https://api.zyte.com/v1/extract',
{
url: 'https://quotes.toscrape.com/scroll',
browserHtml: true,
actions: [
{
action: 'scrollBottom'
}
]
},
{
auth: { username: 'YOUR_API_KEY' }
}
).then((response) => {
const browserHtml = response.data.browserHtml
const $ = cheerio.load(browserHtml)
const quoteCount = $('.quote').length
})
<?php
$client = new GuzzleHttp\Client();
$response = $client->request('POST', 'https://api.zyte.com/v1/extract', [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => 'https://quotes.toscrape.com/scroll',
'browserHtml' => true,
'actions' => [
['action' => 'scrollBottom'],
],
],
]);
$data = json_decode($response->getBody());
$doc = new DOMDocument();
$doc->loadHTML($data->browserHtml);
$xpath = new DOMXPath($doc);
$quote_count = $xpath->query("//*[@class='quote']")->count();
import requests
from parsel import Selector
api_response = requests.post(
"https://api.zyte.com/v1/extract",
auth=("YOUR_API_KEY", ""),
json={
"url": "https://quotes.toscrape.com/scroll",
"browserHtml": True,
"actions": [
{
"action": "scrollBottom",
},
],
},
)
browser_html = api_response.json()["browserHtml"]
quote_count = len(Selector(browser_html).css(".quote"))
import asyncio
from parsel import Selector
from zyte_api import AsyncZyteAPI
async def main():
client = AsyncZyteAPI()
api_response = await client.get(
{
"url": "https://quotes.toscrape.com/scroll",
"browserHtml": True,
"actions": [
{
"action": "scrollBottom",
},
],
},
)
browser_html = api_response["browserHtml"]
quote_count = len(Selector(browser_html).css(".quote"))
print(quote_count)
asyncio.run(main())
from scrapy import Request, Spider
class QuotesToScrapeComSpider(Spider):
name = "quotes_toscrape_com"
def start_requests(self):
yield Request(
"https://quotes.toscrape.com/scroll",
meta={
"zyte_api_automap": {
"browserHtml": True,
"actions": [
{
"action": "scrollBottom",
},
],
},
},
)
def parse(self, response):
quote_count = len(response.css(".quote"))
Output:
100
Getting an HTTP response body
Note
Install and configure code example requirements and the Zyte CA certificate to run the example below.
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.All
};
HttpClient client = new HttpClient(handler);
var apiKey = "YOUR_API_KEY";
var bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(apiKey + ":");
var auth = System.Convert.ToBase64String(bytes);
client.DefaultRequestHeaders.Add("Authorization", "Basic " + auth);
client.DefaultRequestHeaders.Add("Accept-Encoding", "br, gzip, deflate");
var input = new Dictionary<string, object>(){
{"url", "https://toscrape.com"},
{"httpResponseBody", true}
};
var inputJson = JsonSerializer.Serialize(input);
var content = new StringContent(inputJson, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("https://api.zyte.com/v1/extract", content);
var body = await response.Content.ReadAsByteArrayAsync();
var data = JsonDocument.Parse(body);
var base64HttpResponseBody = data.RootElement.GetProperty("httpResponseBody").ToString();
var httpResponseBody = System.Convert.FromBase64String(base64HttpResponseBody);
{"url": "https://toscrape.com", "httpResponseBody": true}
zyte-api input.jsonl \
| jq --raw-output .httpResponseBody \
| base64 --decode \
> output.html
{
"url": "https://toscrape.com",
"httpResponseBody": true
}
curl \
--user YOUR_API_KEY: \
--header 'Content-Type: application/json' \
--data @input.json \
--compressed \
https://api.zyte.com/v1/extract \
| jq --raw-output .httpResponseBody \
| base64 --decode \
> output.html
import com.google.common.collect.ImmutableMap;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Map;
import org.apache.hc.client5.http.classic.methods.HttpPost;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.ContentType;
import org.apache.hc.core5.http.HttpEntity;
import org.apache.hc.core5.http.HttpHeaders;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.io.entity.EntityUtils;
import org.apache.hc.core5.http.io.entity.StringEntity;
class Example {
private static final String API_KEY = "YOUR_API_KEY";
public static void main(final String[] args)
throws InterruptedException, IOException, ParseException {
Map<String, Object> parameters =
ImmutableMap.of("url", "https://toscrape.com", "httpResponseBody", true);
String requestBody = new Gson().toJson(parameters);
HttpPost request = new HttpPost("https://api.zyte.com/v1/extract");
request.setHeader(HttpHeaders.CONTENT_TYPE, ContentType.APPLICATION_JSON);
request.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
request.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
request.setEntity(new StringEntity(requestBody));
CloseableHttpClient client = HttpClients.createDefault();
client.execute(
request,
response -> {
HttpEntity entity = response.getEntity();
String apiResponse = EntityUtils.toString(entity, StandardCharsets.UTF_8);
JsonObject jsonObject = JsonParser.parseString(apiResponse).getAsJsonObject();
String base64HttpResponseBody = jsonObject.get("httpResponseBody").getAsString();
byte[] httpResponseBodyBytes = Base64.getDecoder().decode(base64HttpResponseBody);
String httpResponseBody = new String(httpResponseBodyBytes, StandardCharsets.UTF_8);
System.out.println(httpResponseBody);
return null;
});
}
private static String buildAuthHeader() {
String auth = API_KEY + ":";
String encodedAuth = Base64.getEncoder().encodeToString(auth.getBytes());
return "Basic " + encodedAuth;
}
}
const axios = require('axios')
axios.post(
'https://api.zyte.com/v1/extract',
{
url: 'https://toscrape.com',
httpResponseBody: true
},
{
auth: { username: 'YOUR_API_KEY' }
}
).then((response) => {
const httpResponseBody = Buffer.from(
response.data.httpResponseBody,
'base64'
)
})
<?php
$client = new GuzzleHttp\Client();
$response = $client->request('POST', 'https://api.zyte.com/v1/extract', [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => 'https://toscrape.com',
'httpResponseBody' => true,
],
]);
$data = json_decode($response->getBody());
$http_response_body = base64_decode($data->httpResponseBody);
With the proxy mode, you always get a response body.
curl \
--proxy api.zyte.com:8011 \
--proxy-user YOUR_API_KEY: \
--compressed \
https://toscrape.com \
> output.html
from base64 import b64decode
import requests
api_response = requests.post(
"https://api.zyte.com/v1/extract",
auth=("YOUR_API_KEY", ""),
json={
"url": "https://toscrape.com",
"httpResponseBody": True,
},
)
http_response_body: bytes = b64decode(api_response.json()["httpResponseBody"])
import asyncio
from base64 import b64decode
from zyte_api import AsyncZyteAPI
async def main():
client = AsyncZyteAPI()
api_response = await client.get(
{
"url": "https://toscrape.com",
"httpResponseBody": True,
}
)
http_response_body = b64decode(api_response["httpResponseBody"]).decode()
print(http_response_body)
asyncio.run(main())
In transparent mode, when you target a text resource (e.g. HTML, JSON), regular Scrapy requests work out of the box:
from scrapy import Spider
class ToScrapeSpider(Spider):
name = "toscrape_com"
start_urls = ["https://toscrape.com"]
def parse(self, response):
http_response_text: str = response.text
While regular Scrapy requests also work for binary responses at the moment, they may stop working in future versions of scrapy-zyte-api, so passing httpResponseBody is recommended when targeting binary resources:
from scrapy import Request, Spider
class ToScrapeSpider(Spider):
name = "toscrape_com"
def start_requests(self):
yield Request(
"https://toscrape.com",
meta={
"zyte_api_automap": {
"httpResponseBody": True,
},
},
)
def parse(self, response):
http_response_body: bytes = response.body
Output (first 5 lines):
<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Scraping Sandbox</title>
Setting a Referer
header in a browser request
Note
Install and configure code example requirements and the Zyte CA certificate to run the example below.
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
using System.Xml.XPath;
using HtmlAgilityPack;
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.All
};
HttpClient client = new HttpClient(handler);
var apiKey = "YOUR_API_KEY";
var bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(apiKey + ":");
var auth = System.Convert.ToBase64String(bytes);
client.DefaultRequestHeaders.Add("Authorization", "Basic " + auth);
client.DefaultRequestHeaders.Add("Accept-Encoding", "br, gzip, deflate");
var input = new Dictionary<string, object>(){
{"url", "https://httpbin.org/anything"},
{"browserHtml", true},
{
"requestHeaders",
new Dictionary<string, object>()
{
{"referer", "https://example.org/"}
}
}
};
var inputJson = JsonSerializer.Serialize(input);
var content = new StringContent(inputJson, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("https://api.zyte.com/v1/extract", content);
var body = await response.Content.ReadAsByteArrayAsync();
var data = JsonDocument.Parse(body);
var browserHtml = data.RootElement.GetProperty("browserHtml").ToString();
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(browserHtml);
var navigator = htmlDocument.CreateNavigator();
var nodeIterator = (XPathNodeIterator)navigator.Evaluate("//text()");
nodeIterator.MoveNext();
var responseJson = nodeIterator.Current.ToString();
var responseData = JsonDocument.Parse(responseJson);
var headerEnumerator = responseData.RootElement.GetProperty("headers").EnumerateObject();
var headers = new Dictionary<string, string>();
while (headerEnumerator.MoveNext())
{
headers.Add(
headerEnumerator.Current.Name.ToString(),
headerEnumerator.Current.Value.ToString()
);
}
{"url": "https://httpbin.org/anything", "browserHtml": true, "requestHeaders": {"referer": "https://example.org/"}}
zyte-api input.jsonl \
| jq --raw-output .browserHtml \
| xmllint --html --xpath '//text()' - 2> /dev/null \
| jq .headers
{
"url": "https://httpbin.org/anything",
"browserHtml": true,
"requestHeaders": {
"referer": "https://example.org/"
}
}
curl \
--user YOUR_API_KEY: \
--header 'Content-Type: application/json' \
--data @input.json \
--compressed \
https://api.zyte.com/v1/extract \
| jq --raw-output .browserHtml \
| xmllint --html --xpath '//text()' - 2> /dev/null \
| jq .headers
import com.google.common.collect.ImmutableMap;
import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Map;
import org.apache.hc.client5.http.classic.methods.HttpPost;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.ContentType;
import org.apache.hc.core5.http.HttpEntity;
import org.apache.hc.core5.http.HttpHeaders;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.io.entity.EntityUtils;
import org.apache.hc.core5.http.io.entity.StringEntity;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
class Example {
private static final String API_KEY = "YOUR_API_KEY";
public static void main(final String[] args)
throws InterruptedException, IOException, ParseException {
Map<String, Object> requestHeaders = ImmutableMap.of("referer", "https://example.org/");
Map<String, Object> parameters =
ImmutableMap.of(
"url",
"https://httpbin.org/anything",
"browserHtml",
true,
"requestHeaders",
requestHeaders);
String requestBody = new Gson().toJson(parameters);
HttpPost request = new HttpPost("https://api.zyte.com/v1/extract");
request.setHeader(HttpHeaders.CONTENT_TYPE, ContentType.APPLICATION_JSON);
request.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
request.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
request.setEntity(new StringEntity(requestBody));
CloseableHttpClient client = HttpClients.createDefault();
client.execute(
request,
response -> {
HttpEntity entity = response.getEntity();
String apiResponse = EntityUtils.toString(entity, StandardCharsets.UTF_8);
JsonObject jsonObject = JsonParser.parseString(apiResponse).getAsJsonObject();
String browserHtml = jsonObject.get("browserHtml").getAsString();
Document document = Jsoup.parse(browserHtml);
JsonObject data = JsonParser.parseString(document.text()).getAsJsonObject();
JsonObject headers = data.get("headers").getAsJsonObject();
Gson gson = new GsonBuilder().setPrettyPrinting().create();
System.out.println(gson.toJson(headers));
return null;
});
}
private static String buildAuthHeader() {
String auth = API_KEY + ":";
String encodedAuth = Base64.getEncoder().encodeToString(auth.getBytes());
return "Basic " + encodedAuth;
}
}
const axios = require('axios')
const cheerio = require('cheerio')
axios.post(
'https://api.zyte.com/v1/extract',
{
url: 'https://httpbin.org/anything',
browserHtml: true,
requestHeaders: {
referer: 'https://example.org/'
}
},
{
auth: { username: 'YOUR_API_KEY' }
}
).then((response) => {
const $ = cheerio.load(response.data.browserHtml)
const data = JSON.parse($.text())
const headers = data.headers
})
<?php
$client = new GuzzleHttp\Client();
$response = $client->request('POST', 'https://api.zyte.com/v1/extract', [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => 'https://httpbin.org/anything',
'browserHtml' => true,
'requestHeaders' => [
'referer' => 'https://example.org/',
],
],
]);
$api = json_decode($response->getBody());
$doc = new DOMDocument();
$doc->loadHTML($api->browserHtml);
$data = json_decode($doc->textContent);
$headers = $data->headers;
import json
import requests
from parsel import Selector
api_response = requests.post(
"https://api.zyte.com/v1/extract",
auth=("YOUR_API_KEY", ""),
json={
"url": "https://httpbin.org/anything",
"browserHtml": True,
"requestHeaders": {
"referer": "https://example.org/",
},
},
)
browser_html = api_response.json()["browserHtml"]
selector = Selector(browser_html)
response_json = selector.xpath("//text()").get()
response_data = json.loads(response_json)
headers = response_data["headers"]
import asyncio
import json
from parsel import Selector
from zyte_api import AsyncZyteAPI
async def main():
client = AsyncZyteAPI()
api_response = await client.get(
{
"url": "https://httpbin.org/anything",
"browserHtml": True,
"requestHeaders": {
"referer": "https://example.org/",
},
}
)
browser_html = api_response["browserHtml"]
selector = Selector(browser_html)
response_json = selector.xpath("//text()").get()
response_data = json.loads(response_json)
print(json.dumps(response_data["headers"], indent=2))
asyncio.run(main())
import json
from scrapy import Request, Spider
class HTTPBinOrgSpider(Spider):
name = "httpbin_org"
def start_requests(self):
yield Request(
"https://httpbin.org/anything",
headers={"Referer": "https://example.org/"},
meta={
"zyte_api_automap": {
"browserHtml": True,
},
},
)
def parse(self, response):
response_json = response.xpath("//text()").get()
response_data = json.loads(response_json)
headers = response_data["headers"]
Output ("Referer"
line):
"Referer": "https://example.org/",
Getting browser HTML
Note
Install and configure code example requirements and the Zyte CA certificate to run the example below.
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.All
};
HttpClient client = new HttpClient(handler);
var apiKey = "YOUR_API_KEY";
var bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(apiKey + ":");
var auth = System.Convert.ToBase64String(bytes);
client.DefaultRequestHeaders.Add("Authorization", "Basic " + auth);
client.DefaultRequestHeaders.Add("Accept-Encoding", "br, gzip, deflate");
var input = new Dictionary<string, object>(){
{"url", "https://toscrape.com"},
{"browserHtml", true}
};
var inputJson = JsonSerializer.Serialize(input);
var content = new StringContent(inputJson, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("https://api.zyte.com/v1/extract", content);
var body = await response.Content.ReadAsByteArrayAsync();
var data = JsonDocument.Parse(body);
var browserHtml = data.RootElement.GetProperty("browserHtml").ToString();
{"url": "https://toscrape.com", "browserHtml": true}
zyte-api input.jsonl \
| jq --raw-output .browserHtml
{
"url": "https://toscrape.com",
"browserHtml": true
}
curl \
--user YOUR_API_KEY: \
--header 'Content-Type: application/json' \
--data @input.json \
--compressed \
https://api.zyte.com/v1/extract \
| jq --raw-output .browserHtml
import com.google.common.collect.ImmutableMap;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Map;
import org.apache.hc.client5.http.classic.methods.HttpPost;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.ContentType;
import org.apache.hc.core5.http.HttpEntity;
import org.apache.hc.core5.http.HttpHeaders;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.io.entity.EntityUtils;
import org.apache.hc.core5.http.io.entity.StringEntity;
class Example {
private static final String API_KEY = "YOUR_API_KEY";
public static void main(final String[] args)
throws InterruptedException, IOException, ParseException {
Map<String, Object> parameters =
ImmutableMap.of("url", "https://toscrape.com", "browserHtml", true);
String requestBody = new Gson().toJson(parameters);
HttpPost request = new HttpPost("https://api.zyte.com/v1/extract");
request.setHeader(HttpHeaders.CONTENT_TYPE, ContentType.APPLICATION_JSON);
request.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
request.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
request.setEntity(new StringEntity(requestBody));
CloseableHttpClient client = HttpClients.createDefault();
client.execute(
request,
response -> {
HttpEntity entity = response.getEntity();
String apiResponse = EntityUtils.toString(entity, StandardCharsets.UTF_8);
JsonObject jsonObject = JsonParser.parseString(apiResponse).getAsJsonObject();
String browserHtml = jsonObject.get("browserHtml").getAsString();
System.out.println(browserHtml);
return null;
});
}
private static String buildAuthHeader() {
String auth = API_KEY + ":";
String encodedAuth = Base64.getEncoder().encodeToString(auth.getBytes());
return "Basic " + encodedAuth;
}
}
const axios = require('axios')
axios.post(
'https://api.zyte.com/v1/extract',
{
url: 'https://toscrape.com',
browserHtml: true
},
{
auth: { username: 'YOUR_API_KEY' }
}
).then((response) => {
const browserHtml = response.data.browserHtml
})
<?php
$client = new GuzzleHttp\Client();
$response = $client->request('POST', 'https://api.zyte.com/v1/extract', [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => 'https://toscrape.com',
'browserHtml' => true,
],
]);
$api = json_decode($response->getBody());
$browser_html = $api->browserHtml;
curl \
--proxy api.zyte.com:8011 \
--proxy-user YOUR_API_KEY: \
--compressed \
-H "Zyte-Browser-Html: true" \
https://toscrape.com
import requests
api_response = requests.post(
"https://api.zyte.com/v1/extract",
auth=("YOUR_API_KEY", ""),
json={
"url": "https://toscrape.com",
"browserHtml": True,
},
)
browser_html: str = api_response.json()["browserHtml"]
import asyncio
from zyte_api import AsyncZyteAPI
async def main():
client = AsyncZyteAPI()
api_response = await client.get(
{
"url": "https://toscrape.com",
"browserHtml": True,
}
)
print(api_response["browserHtml"])
asyncio.run(main())
from scrapy import Request, Spider
class ToScrapeSpider(Spider):
name = "toscrape_com"
def start_requests(self):
yield Request(
"https://toscrape.com",
meta={
"zyte_api_automap": {
"browserHtml": True,
},
},
)
def parse(self, response):
browser_html: str = response.text
Output (first 5 lines):
<!DOCTYPE html><html lang="en"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Scraping Sandbox</title>
<link href="./css/bootstrap.min.css" rel="stylesheet">
<link href="./css/main.css" rel="stylesheet">
Reusing browser cookies on HTTP requests
Send a browser request to the home page of a website, and use its response cookies as request cookies in an HTTP request to a different URL of that website.
using System;
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.All
};
HttpClient client = new HttpClient(handler);
var apiKey = "YOUR_API_KEY";
var bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(apiKey + ":");
var auth = System.Convert.ToBase64String(bytes);
client.DefaultRequestHeaders.Add("Authorization", "Basic " + auth);
client.DefaultRequestHeaders.Add("Accept-Encoding", "br, gzip, deflate");
var browserInput = new Dictionary<string, object>(){
{"url", "https://toscrape.com/"},
{"browserHtml", true},
{"responseCookies", true}
};
var browserInputJson = JsonSerializer.Serialize(browserInput);
var browserContent = new StringContent(browserInputJson, Encoding.UTF8, "application/json");
HttpResponseMessage browserResponse = await client.PostAsync("https://api.zyte.com/v1/extract", browserContent);
var browserResponseBody = await browserResponse.Content.ReadAsByteArrayAsync();
var browserData = JsonDocument.Parse(browserResponseBody);
var httpInput = new Dictionary<string, object>(){
{"url", "https://toscrape.com/"},
{"httpResponseBody", true},
{"requestCookies", browserData.RootElement.GetProperty("responseCookies")}
};
var httpInputJson = JsonSerializer.Serialize(httpInput);
var httpContent = new StringContent(httpInputJson, Encoding.UTF8, "application/json");
HttpResponseMessage httpResponse = await client.PostAsync("https://api.zyte.com/v1/extract", httpContent);
var httpResponseBody = await httpResponse.Content.ReadAsByteArrayAsync();
var httpData = JsonDocument.Parse(httpResponseBody);
var base64HttpResponseBodyField = httpData.RootElement.GetProperty("httpResponseBody").ToString();
var httpResponseBodyField = System.Convert.FromBase64String(base64HttpResponseBodyField);
var result = System.Text.Encoding.UTF8.GetString(httpResponseBodyField);
Console.WriteLine(result);
import com.google.common.collect.ImmutableMap;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Map;
import org.apache.hc.client5.http.classic.methods.HttpPost;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.ContentType;
import org.apache.hc.core5.http.HttpEntity;
import org.apache.hc.core5.http.HttpHeaders;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.io.entity.EntityUtils;
import org.apache.hc.core5.http.io.entity.StringEntity;
class Example {
private static final String API_KEY = "YOUR_API_KEY";
public static void main(final String[] args)
throws InterruptedException, IOException, ParseException {
Map<String, Object> browserParameters =
ImmutableMap.of(
"url", "https://toscrape.com/", "browserHtml", true, "responseCookies", true);
String browserRequestBody = new Gson().toJson(browserParameters);
HttpPost browserRequest = new HttpPost("https://api.zyte.com/v1/extract");
browserRequest.setHeader(HttpHeaders.CONTENT_TYPE, ContentType.APPLICATION_JSON);
browserRequest.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
browserRequest.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
browserRequest.setEntity(new StringEntity(browserRequestBody));
CloseableHttpClient client = HttpClients.createDefault();
client.execute(
browserRequest,
browserResponse -> {
HttpEntity browserEntity = browserResponse.getEntity();
String browserApiResponse = EntityUtils.toString(browserEntity, StandardCharsets.UTF_8);
JsonObject browserJsonObject =
JsonParser.parseString(browserApiResponse).getAsJsonObject();
Map<String, Object> httpParameters =
ImmutableMap.of(
"url",
"https://books.toscrape.com/",
"httpResponseBody",
true,
"requestCookies",
browserJsonObject.get("responseCookies"));
String httpRequestBody = new Gson().toJson(httpParameters);
HttpPost httpRequest = new HttpPost("https://api.zyte.com/v1/extract");
httpRequest.setHeader(HttpHeaders.CONTENT_TYPE, ContentType.APPLICATION_JSON);
httpRequest.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
httpRequest.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
httpRequest.setEntity(new StringEntity(httpRequestBody));
client.execute(
httpRequest,
httpResponse -> {
HttpEntity httpEntity = httpResponse.getEntity();
String httpApiResponse = EntityUtils.toString(httpEntity, StandardCharsets.UTF_8);
JsonObject httpJsonObject =
JsonParser.parseString(httpApiResponse).getAsJsonObject();
String base64HttpResponseBody =
httpJsonObject.get("httpResponseBody").getAsString();
byte[] httpResponseBodyBytes = Base64.getDecoder().decode(base64HttpResponseBody);
String httpResponseBody = new String(httpResponseBodyBytes, StandardCharsets.UTF_8);
System.out.println(httpResponseBody);
return null;
});
return null;
});
}
private static String buildAuthHeader() {
String auth = API_KEY + ":";
String encodedAuth = Base64.getEncoder().encodeToString(auth.getBytes());
return "Basic " + encodedAuth;
}
}
const axios = require('axios')
axios.post(
'https://api.zyte.com/v1/extract',
{
url: 'https://toscrape.com/',
browserHtml: true,
responseCookies: true
},
{
auth: { username: 'YOUR_API_KEY' }
}
).then((browserResponse) => {
axios.post(
'https://api.zyte.com/v1/extract',
{
url: 'https://books.toscrape.com/',
httpResponseBody: true,
requestCookies: browserResponse.data.responseCookies
},
{
auth: { username: 'YOUR_API_KEY' }
}
).then((httpResponse) => {
const httpResponseBody = Buffer.from(
httpResponse.data.httpResponseBody,
'base64'
)
console.log(httpResponseBody.toString())
})
})
<?php
$client = new GuzzleHttp\Client();
$browser_response = $client->request('POST', 'https://api.zyte.com/v1/extract', [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => 'https://toscrape.com/',
'browserHtml' => true,
'responseCookies' => true,
],
]);
$browser_data = json_decode($browser_response->getBody());
$http_response = $client->request('POST', 'https://api.zyte.com/v1/extract', [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => 'https://books.toscrape.com/',
'httpResponseBody' => true,
'requestCookies' => $browser_data->responseCookies,
],
]);
$http_data = json_decode($http_response->getBody());
$http_response_body = base64_decode($http_data->httpResponseBody);
echo $http_response_body;
from base64 import b64decode
import requests
browser_response = requests.post(
"https://api.zyte.com/v1/extract",
auth=("YOUR_API_KEY", ""),
json={
"url": "https://toscrape.com/",
"browserHtml": True,
"responseCookies": True,
},
)
http_response = requests.post(
"https://api.zyte.com/v1/extract",
auth=("YOUR_API_KEY", ""),
json={
"url": "https://books.toscrape.com/",
"httpResponseBody": True,
"requestCookies": browser_response.json()["responseCookies"],
},
)
http_response_body = b64decode(http_response.json()["httpResponseBody"])
print(http_response_body.decode())
import asyncio
from base64 import b64decode
from zyte_api import AsyncZyteAPI
async def main():
client = AsyncZyteAPI()
browser_response = await client.get(
{
"url": "https://toscrape.com/",
"browserHtml": True,
"responseCookies": True,
}
)
http_response = await client.get(
{
"url": "https://books.toscrape.com/",
"httpResponseBody": True,
"requestCookies": browser_response["responseCookies"],
}
)
http_response_body = b64decode(http_response["httpResponseBody"]).decode()
print(http_response_body)
asyncio.run(main())
from scrapy import Request, Spider
class ToScrapeComSpider(Spider):
name = "toscrape_com"
def start_requests(self):
yield Request(
"https://toscrape.com/",
callback=self.parse_browser,
meta={
"zyte_api_automap": {
"browserHtml": True,
"responseCookies": True,
},
},
)
def parse_browser(self, response):
yield response.follow(
"https://books.toscrape.com/",
callback=self.parse_http,
meta={
"zyte_api_automap": {
"requestCookies": response.raw_api_response["responseCookies"],
},
},
)
def parse_http(self, response):
print(response.text)
Setting a geolocation
Note
Install and configure code example requirements and the Zyte CA certificate to run the example below.
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.All
};
HttpClient client = new HttpClient(handler);
var apiKey = "YOUR_API_KEY";
var bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(apiKey + ":");
var auth = System.Convert.ToBase64String(bytes);
client.DefaultRequestHeaders.Add("Authorization", "Basic " + auth);
client.DefaultRequestHeaders.Add("Accept-Encoding", "br, gzip, deflate");
var input = new Dictionary<string, object>(){
{"url", "http://ip-api.com/json"},
{"httpResponseBody", true},
{"geolocation", "AU"}
};
var inputJson = JsonSerializer.Serialize(input);
var content = new StringContent(inputJson, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("https://api.zyte.com/v1/extract", content);
var body = await response.Content.ReadAsByteArrayAsync();
var data = JsonDocument.Parse(body);
var base64HttpResponseBody = data.RootElement.GetProperty("httpResponseBody").ToString();
var httpResponseBody = System.Convert.FromBase64String(base64HttpResponseBody);
var responseData = JsonDocument.Parse(httpResponseBody);
var countryCode = responseData.RootElement.GetProperty("countryCode").ToString();
{"url": "http://ip-api.com/json", "httpResponseBody": true, "geolocation": "AU"}
zyte-api input.jsonl \
| jq --raw-output .httpResponseBody \
| base64 --decode \
| jq .countryCode
{
"url": "http://ip-api.com/json",
"httpResponseBody": true,
"geolocation": "AU"
}
curl \
--user YOUR_API_KEY: \
--header 'Content-Type: application/json' \
--data @input.json \
--compressed \
https://api.zyte.com/v1/extract \
| jq --raw-output .httpResponseBody \
| base64 --decode \
| jq .countryCode
import com.google.common.collect.ImmutableMap;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Map;
import org.apache.hc.client5.http.classic.methods.HttpPost;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.ContentType;
import org.apache.hc.core5.http.HttpEntity;
import org.apache.hc.core5.http.HttpHeaders;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.io.entity.EntityUtils;
import org.apache.hc.core5.http.io.entity.StringEntity;
class Example {
private static final String API_KEY = "YOUR_API_KEY";
public static void main(final String[] args)
throws InterruptedException, IOException, ParseException {
Map<String, Object> parameters =
ImmutableMap.of(
"url", "http://ip-api.com/json", "httpResponseBody", true, "geolocation", "AU");
String requestBody = new Gson().toJson(parameters);
HttpPost request = new HttpPost("https://api.zyte.com/v1/extract");
request.setHeader(HttpHeaders.CONTENT_TYPE, ContentType.APPLICATION_JSON);
request.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
request.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
request.setEntity(new StringEntity(requestBody));
CloseableHttpClient client = HttpClients.createDefault();
client.execute(
request,
response -> {
HttpEntity entity = response.getEntity();
String apiResponse = EntityUtils.toString(entity, StandardCharsets.UTF_8);
JsonObject jsonObject = JsonParser.parseString(apiResponse).getAsJsonObject();
String base64HttpResponseBody = jsonObject.get("httpResponseBody").getAsString();
byte[] httpResponseBodyBytes = Base64.getDecoder().decode(base64HttpResponseBody);
String httpResponseBody = new String(httpResponseBodyBytes, StandardCharsets.UTF_8);
JsonObject data = JsonParser.parseString(httpResponseBody).getAsJsonObject();
String countryCode = data.get("countryCode").getAsString();
System.out.println(countryCode);
return null;
});
}
private static String buildAuthHeader() {
String auth = API_KEY + ":";
String encodedAuth = Base64.getEncoder().encodeToString(auth.getBytes());
return "Basic " + encodedAuth;
}
}
const axios = require('axios')
axios.post(
'https://api.zyte.com/v1/extract',
{
url: 'http://ip-api.com/json',
httpResponseBody: true,
geolocation: 'AU'
},
{
auth: { username: 'YOUR_API_KEY' }
}
).then((response) => {
const httpResponseBody = Buffer.from(
response.data.httpResponseBody,
'base64'
)
const data = JSON.parse(httpResponseBody)
const countryCode = data.countryCode
})
<?php
$client = new GuzzleHttp\Client();
$response = $client->request('POST', 'https://api.zyte.com/v1/extract', [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => 'http://ip-api.com/json',
'httpResponseBody' => true,
'geolocation' => 'AU',
],
]);
$api = json_decode($response->getBody());
$http_response_body = base64_decode($api->httpResponseBody);
$data = json_decode($http_response_body);
$country_code = $data->countryCode;
With the proxy mode, use the Zyte-Geolocation header.
curl \
--proxy api.zyte.com:8011 \
--proxy-user YOUR_API_KEY: \
--compressed \
-H "Zyte-Geolocation: US" \
http://ip-api.com/json \
| jq .countryCode
import json
from base64 import b64decode
import requests
api_response = requests.post(
"https://api.zyte.com/v1/extract",
auth=("YOUR_API_KEY", ""),
json={
"url": "http://ip-api.com/json",
"httpResponseBody": True,
"geolocation": "AU",
},
)
http_response_body: bytes = b64decode(api_response.json()["httpResponseBody"])
response_data = json.loads(http_response_body)
country_code = response_data["countryCode"]
import asyncio
import json
from base64 import b64decode
from zyte_api import AsyncZyteAPI
async def main():
client = AsyncZyteAPI()
api_response = await client.get(
{
"url": "http://ip-api.com/json",
"httpResponseBody": True,
"geolocation": "AU",
}
)
http_response_body: bytes = b64decode(api_response["httpResponseBody"])
response_data = json.loads(http_response_body)
print(response_data["countryCode"])
asyncio.run(main())
import json
from scrapy import Request, Spider
class IPAPIComSpider(Spider):
name = "ip_api_com"
def start_requests(self):
yield Request(
"http://ip-api.com/json",
meta={
"zyte_api_automap": {
"geolocation": "AU",
},
},
)
def parse(self, response):
response_data = json.loads(response.body)
country_code = response_data["countryCode"]
Output:
AU
Making an HTTP request seem like it comes from
a mobile device
Note
Install and configure code example requirements and the Zyte CA certificate to run the example below.
using System;
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.All
};
HttpClient client = new HttpClient(handler);
var apiKey = "YOUR_API_KEY";
var bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(apiKey + ":");
var auth = System.Convert.ToBase64String(bytes);
client.DefaultRequestHeaders.Add("Authorization", "Basic " + auth);
client.DefaultRequestHeaders.Add("Accept-Encoding", "br, gzip, deflate");
var input = new Dictionary<string, object>(){
{"url", "https://httpbin.org/user-agent"},
{"httpResponseBody", true},
{"device", "mobile"}
};
var inputJson = JsonSerializer.Serialize(input);
var content = new StringContent(inputJson, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("https://api.zyte.com/v1/extract", content);
var body = await response.Content.ReadAsByteArrayAsync();
var data = JsonDocument.Parse(body);
var base64HttpResponseBody = data.RootElement.GetProperty("httpResponseBody").ToString();
var httpResponseBody = System.Convert.FromBase64String(base64HttpResponseBody);
var responseData = JsonDocument.Parse(httpResponseBody);
var headerEnumerator = responseData.RootElement.EnumerateObject();
while (headerEnumerator.MoveNext())
{
if (headerEnumerator.Current.Name.ToString() == "user-agent")
{
Console.WriteLine(headerEnumerator.Current.Value.ToString());
}
}
{"url": "https://httpbin.org/user-agent", "httpResponseBody": true, "device": "mobile"}
zyte-api input.jsonl \
| jq --raw-output .httpResponseBody \
| base64 --decode \
| jq --raw-output '.["user-agent"]'
{
"url": "https://httpbin.org/user-agent",
"httpResponseBody": true,
"device": "mobile"
}
curl \
--user YOUR_API_KEY: \
--header 'Content-Type: application/json' \
--data @input.json \
--compressed \
https://api.zyte.com/v1/extract \
| jq --raw-output .httpResponseBody \
| base64 --decode \
| jq --raw-output '.["user-agent"]'
import com.google.common.collect.ImmutableMap;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Map;
import org.apache.hc.client5.http.classic.methods.HttpPost;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.ContentType;
import org.apache.hc.core5.http.HttpEntity;
import org.apache.hc.core5.http.HttpHeaders;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.io.entity.EntityUtils;
import org.apache.hc.core5.http.io.entity.StringEntity;
class Example {
private static final String API_KEY = "YOUR_API_KEY";
public static void main(final String[] args)
throws InterruptedException, IOException, ParseException {
Map<String, Object> parameters =
ImmutableMap.of(
"url", "https://httpbin.org/user-agent", "httpResponseBody", true, "device", "mobile");
String requestBody = new Gson().toJson(parameters);
HttpPost request = new HttpPost("https://api.zyte.com/v1/extract");
request.setHeader(HttpHeaders.CONTENT_TYPE, ContentType.APPLICATION_JSON);
request.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
request.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
request.setEntity(new StringEntity(requestBody));
CloseableHttpClient client = HttpClients.createDefault();
client.execute(
request,
response -> {
HttpEntity entity = response.getEntity();
String apiResponse = EntityUtils.toString(entity, StandardCharsets.UTF_8);
JsonObject jsonObject = JsonParser.parseString(apiResponse).getAsJsonObject();
String base64HttpResponseBody = jsonObject.get("httpResponseBody").getAsString();
byte[] httpResponseBodyBytes = Base64.getDecoder().decode(base64HttpResponseBody);
String httpResponseBody = new String(httpResponseBodyBytes, StandardCharsets.UTF_8);
JsonObject data = JsonParser.parseString(httpResponseBody).getAsJsonObject();
String userAgent = data.get("user-agent").getAsString();
System.out.println(userAgent);
return null;
});
}
private static String buildAuthHeader() {
String auth = API_KEY + ":";
String encodedAuth = Base64.getEncoder().encodeToString(auth.getBytes());
return "Basic " + encodedAuth;
}
}
const axios = require('axios')
axios.post(
'https://api.zyte.com/v1/extract',
{
url: 'https://httpbin.org/user-agent',
httpResponseBody: true,
device: 'mobile'
},
{
auth: { username: 'YOUR_API_KEY' }
}
).then((response) => {
const httpResponseBody = Buffer.from(
response.data.httpResponseBody,
'base64'
)
console.log(JSON.parse(httpResponseBody)['user-agent'])
})
<?php
$client = new GuzzleHttp\Client();
$response = $client->request('POST', 'https://api.zyte.com/v1/extract', [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => 'https://httpbin.org/user-agent',
'httpResponseBody' => true,
'device' => 'mobile',
],
]);
$api = json_decode($response->getBody());
$http_response_body = base64_decode($api->httpResponseBody);
$data = json_decode($http_response_body);
echo $data->{'user-agent'}.PHP_EOL;
With the proxy mode, use the Zyte-Device header.
curl \
--proxy api.zyte.com:8011 \
--proxy-user YOUR_API_KEY: \
--compressed \
-H "Zyte-Device: mobile" \
https://httpbin.org/user-agent \
| jq --raw-output '.["user-agent"]'
import json
from base64 import b64decode
import requests
api_response = requests.post(
"https://api.zyte.com/v1/extract",
auth=("YOUR_API_KEY", ""),
json={
"url": "https://httpbin.org/user-agent",
"httpResponseBody": True,
"device": "mobile",
},
)
http_response_body = b64decode(api_response.json()["httpResponseBody"])
user_agent = json.loads(http_response_body)["user-agent"]
print(user_agent)
import asyncio
import json
from base64 import b64decode
from zyte_api import AsyncZyteAPI
async def main():
client = AsyncZyteAPI()
api_response = await client.get(
{
"url": "https://httpbin.org/user-agent",
"httpResponseBody": True,
"device": "mobile",
}
)
http_response_body: bytes = b64decode(api_response["httpResponseBody"])
user_agent = json.loads(http_response_body)["user-agent"]
print(user_agent)
asyncio.run(main())
import json
from scrapy import Request, Spider
class HTTPBinOrgSpider(Spider):
name = "httpbin_org"
def start_requests(self):
yield Request(
"https://httpbin.org/user-agent",
meta={
"zyte_api_automap": {
"device": "mobile",
}
},
)
def parse(self, response):
user_agent = json.loads(response.text)["user-agent"]
print(user_agent)
Example output (may vary):
Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Mobile Safari/537.36
Getting structured data from a product
details page of an e-commerce website
Note
Install and configure code example requirements and the Zyte CA certificate to run the example below.
using System;
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.All
};
HttpClient client = new HttpClient(handler);
var apiKey = "YOUR_API_KEY";
var bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(apiKey + ":");
var auth = System.Convert.ToBase64String(bytes);
client.DefaultRequestHeaders.Add("Authorization", "Basic " + auth);
client.DefaultRequestHeaders.Add("Accept-Encoding", "br, gzip, deflate");
var input = new Dictionary<string, object>(){
{"url", "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"},
{"product", true}
};
var inputJson = JsonSerializer.Serialize(input);
var content = new StringContent(inputJson, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("https://api.zyte.com/v1/extract", content);
var body = await response.Content.ReadAsByteArrayAsync();
var data = JsonDocument.Parse(body);
var product = data.RootElement.GetProperty("product").ToString();
Console.WriteLine(product);
{"url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html", "product": true}
zyte-api input.jsonl \
| jq --raw-output .product
{
"url": "https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
"product": true
}
curl \
--user YOUR_API_KEY: \
--header 'Content-Type: application/json' \
--data @input.json \
--compressed \
https://api.zyte.com/v1/extract \
| jq --raw-output .product
import com.google.common.collect.ImmutableMap;
import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Map;
import org.apache.hc.client5.http.classic.methods.HttpPost;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.ContentType;
import org.apache.hc.core5.http.HttpEntity;
import org.apache.hc.core5.http.HttpHeaders;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.io.entity.EntityUtils;
import org.apache.hc.core5.http.io.entity.StringEntity;
class Example {
private static final String API_KEY = "YOUR_API_KEY";
public static void main(final String[] args)
throws InterruptedException, IOException, ParseException {
Map<String, Object> parameters =
ImmutableMap.of(
"url",
"https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html",
"product",
true);
String requestBody = new Gson().toJson(parameters);
HttpPost request = new HttpPost("https://api.zyte.com/v1/extract");
request.setHeader(HttpHeaders.CONTENT_TYPE, ContentType.APPLICATION_JSON);
request.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
request.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
request.setEntity(new StringEntity(requestBody));
CloseableHttpClient client = HttpClients.createDefault();
client.execute(
request,
response -> {
HttpEntity entity = response.getEntity();
String apiResponse = EntityUtils.toString(entity, StandardCharsets.UTF_8);
JsonObject jsonObject = JsonParser.parseString(apiResponse).getAsJsonObject();
JsonObject product = jsonObject.get("product").getAsJsonObject();
Gson gson = new GsonBuilder().setPrettyPrinting().create();
System.out.println(gson.toJson(product));
return null;
});
}
private static String buildAuthHeader() {
String auth = API_KEY + ":";
String encodedAuth = Base64.getEncoder().encodeToString(auth.getBytes());
return "Basic " + encodedAuth;
}
}
const axios = require('axios')
axios.post(
'https://api.zyte.com/v1/extract',
{
url: 'https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html',
product: true
},
{
auth: { username: 'YOUR_API_KEY' }
}
).then((response) => {
const product = response.data.product
console.log(product)
})
<?php
$client = new GuzzleHttp\Client();
$response = $client->request('POST', 'https://api.zyte.com/v1/extract', [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => 'https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html',
'product' => true,
],
]);
$data = json_decode($response->getBody());
$product = json_encode($data->product);
echo $product.PHP_EOL;
import requests
api_response = requests.post(
"https://api.zyte.com/v1/extract",
auth=("YOUR_API_KEY", ""),
json={
"url": (
"https://books.toscrape.com/catalogue"
"/a-light-in-the-attic_1000/index.html"
),
"product": True,
},
)
product = api_response.json()["product"]
print(product)
import asyncio
import json
from zyte_api import AsyncZyteAPI
async def main():
client = AsyncZyteAPI()
api_response = await client.get(
{
"url": (
"https://books.toscrape.com/catalogue"
"/a-light-in-the-attic_1000/index.html"
),
"product": True,
}
)
product = api_response["product"]
print(json.dumps(product, indent=2, ensure_ascii=False))
asyncio.run(main())
from scrapy import Request, Spider
class BooksToScrapeComSpider(Spider):
name = "books_toscrape_com"
def start_requests(self):
yield Request(
(
"https://books.toscrape.com/catalogue"
"/a-light-in-the-attic_1000/index.html"
),
meta={
"zyte_api_automap": {
"product": True,
},
},
)
def parse(self, response):
product = response.raw_api_response["product"]
print(product)
Output (first 5 lines):
{
"name": "A Light in the Attic",
"price": "51.77",
"currency": "GBP",
"currencyRaw": "£",
Submitting an HTML form with an HTTP
request
Note
Install and configure code example requirements and the Zyte CA certificate to run the example below.
In https://quotes.toscrape.com/search.aspx you get an HTML form that could be stripped down to:
<form action="/filter.aspx" method="post" >
<select name="author">
<option>----------</option>
<option value="Albert Einstein">
Albert Einstein
</option>
<!-- [more options] -->
</select>
<select name="tag">
<option>----------</option>
</select>
<input type="hidden" name="__VIEWSTATE" value="ZTYzZDZ…">
</form>
When you select an Author (e.g. Albert Einstein), a form request is sent, and the Tag options fill up.
To reproduce that:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
using System.Xml.XPath;
using System.Web;
using HtmlAgilityPack;
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.All
};
HttpClient client = new HttpClient(handler);
var apiKey = "YOUR_API_KEY";
var bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(apiKey + ":");
var auth = System.Convert.ToBase64String(bytes);
client.DefaultRequestHeaders.Add("Authorization", "Basic " + auth);
client.DefaultRequestHeaders.Add("Accept-Encoding", "br, gzip, deflate");
var input1 = new Dictionary<string, object>(){
{"url", "https://quotes.toscrape.com/search.aspx"},
{"httpResponseBody", true}
};
var inputJson1 = JsonSerializer.Serialize(input1);
var content1 = new StringContent(inputJson1, Encoding.UTF8, "application/json");
HttpResponseMessage response1 = await client.PostAsync("https://api.zyte.com/v1/extract", content1);
var body1 = await response1.Content.ReadAsByteArrayAsync();
var data1 = JsonDocument.Parse(body1);
var base64HttpResponseBody1 = data1.RootElement.GetProperty("httpResponseBody").ToString();
var httpResponseBodyBytes1 = System.Convert.FromBase64String(base64HttpResponseBody1);
var httpResponseBody1 = System.Text.Encoding.UTF8.GetString(httpResponseBodyBytes1);
var htmlDocument1 = new HtmlDocument();
htmlDocument1.LoadHtml(httpResponseBody1);
var navigator1 = htmlDocument1.CreateNavigator();
var nodeIterator = (XPathNodeIterator)navigator1.Evaluate("//*[@name='__VIEWSTATE']/@value");
nodeIterator.MoveNext();
var viewState = nodeIterator.Current.ToString();
var httpRequestTextParameters = new Dictionary<string, string>
{
{ "author", "Albert Einstein" },
{ "tag", "----------" },
{ "__VIEWSTATE", viewState}
};
var httpRequestText = string.Join("&",
httpRequestTextParameters.Select(kvp => $"{HttpUtility.UrlEncode(kvp.Key)}={HttpUtility.UrlEncode(kvp.Value)}"));
var input2 = new Dictionary<string, object>(){
{"url", "https://quotes.toscrape.com/filter.aspx"},
{"httpResponseBody", true},
{"httpRequestMethod", "POST"},
{
"customHttpRequestHeaders",
new List<Dictionary<string, object>>()
{
new Dictionary<string, object>()
{
{"name", "Content-Type"},
{"value", "application/x-www-form-urlencoded"}
}
}
},
{"httpRequestText", httpRequestText}
};
var inputJson2 = JsonSerializer.Serialize(input2);
var content2 = new StringContent(inputJson2, Encoding.UTF8, "application/json");
HttpResponseMessage response2 = await client.PostAsync("https://api.zyte.com/v1/extract", content2);
var body2 = await response2.Content.ReadAsByteArrayAsync();
var data2 = JsonDocument.Parse(body2);
var base64HttpResponseBody2 = data2.RootElement.GetProperty("httpResponseBody").ToString();
var httpResponseBodyBytes2 = System.Convert.FromBase64String(base64HttpResponseBody2);
var httpResponseBody2 = System.Text.Encoding.UTF8.GetString(httpResponseBodyBytes2);
var htmlDocument2 = new HtmlDocument();
htmlDocument2.LoadHtml(httpResponseBody2);
var navigator2 = htmlDocument2.CreateNavigator();
var nodeIterator2 = (XPathNodeIterator)navigator2.Evaluate("//*[@name='tag']//option");
int tagCount = 0;
while (nodeIterator2.MoveNext())
{
tagCount++;
}
Console.WriteLine($"{tagCount}");
import com.google.common.collect.ImmutableMap;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.ArrayList;
import java.util.Base64;
import java.util.Collections;
import java.util.List;
import java.util.Map;
import org.apache.hc.client5.http.classic.methods.HttpPost;
import org.apache.hc.client5.http.entity.UrlEncodedFormEntity;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.ContentType;
import org.apache.hc.core5.http.HttpEntity;
import org.apache.hc.core5.http.HttpHeaders;
import org.apache.hc.core5.http.NameValuePair;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.io.entity.EntityUtils;
import org.apache.hc.core5.http.io.entity.StringEntity;
import org.apache.hc.core5.http.message.BasicNameValuePair;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.select.Elements;
class Example {
private static final String API_KEY = "YOUR_API_KEY";
public static void main(final String[] args)
throws InterruptedException, IOException, ParseException {
Map<String, Object> parameters1 =
ImmutableMap.of("url", "https://quotes.toscrape.com/search.aspx", "httpResponseBody", true);
String requestBody1 = new Gson().toJson(parameters1);
HttpPost request1 = new HttpPost("https://api.zyte.com/v1/extract");
request1.setHeader(HttpHeaders.CONTENT_TYPE, ContentType.APPLICATION_JSON);
request1.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
request1.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
request1.setEntity(new StringEntity(requestBody1));
CloseableHttpClient client = HttpClients.createDefault();
client.execute(
request1,
(response1) -> {
HttpEntity httpEntity1 = response1.getEntity();
String httpApiResponse1 = EntityUtils.toString(httpEntity1, StandardCharsets.UTF_8);
JsonObject httpJsonObject1 = JsonParser.parseString(httpApiResponse1).getAsJsonObject();
String base64HttpResponseBody1 = httpJsonObject1.get("httpResponseBody").getAsString();
byte[] httpResponseBodyBytes1 = Base64.getDecoder().decode(base64HttpResponseBody1);
String httpResponseBody1 = new String(httpResponseBodyBytes1, StandardCharsets.UTF_8);
Document document1 = Jsoup.parse(httpResponseBody1);
String viewState = document1.select("[name='__VIEWSTATE']").attr("value");
Map<String, String> params =
ImmutableMap.of(
"author", "Albert Einstein",
"tag", "----------",
"__VIEWSTATE", viewState);
List<NameValuePair> formParams = new ArrayList<>();
for (Map.Entry<String, String> entry : params.entrySet()) {
formParams.add(new BasicNameValuePair(entry.getKey(), entry.getValue()));
}
UrlEncodedFormEntity entity =
new UrlEncodedFormEntity(formParams, StandardCharsets.UTF_8);
String httpRequestText = EntityUtils.toString(entity);
Map<String, Object> customHttpRequestHeader =
ImmutableMap.of("name", "Content-Type", "value", "application/x-www-form-urlencoded");
Map<String, Object> parameters2 =
ImmutableMap.of(
"url",
"https://quotes.toscrape.com/filter.aspx",
"httpResponseBody",
true,
"httpRequestMethod",
"POST",
"customHttpRequestHeaders",
Collections.singletonList(customHttpRequestHeader),
"httpRequestText",
httpRequestText);
String requestBody2 = new Gson().toJson(parameters2);
HttpPost request2 = new HttpPost("https://api.zyte.com/v1/extract");
request2.setHeader(HttpHeaders.CONTENT_TYPE, ContentType.APPLICATION_JSON);
request2.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
request2.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
request2.setEntity(new StringEntity(requestBody2));
client.execute(
request2,
(response2) -> {
HttpEntity httpEntity2 = response2.getEntity();
String httpApiResponse2 = EntityUtils.toString(httpEntity2, StandardCharsets.UTF_8);
JsonObject httpJsonObject2 =
JsonParser.parseString(httpApiResponse2).getAsJsonObject();
String base64HttpResponseBody2 =
httpJsonObject2.get("httpResponseBody").getAsString();
byte[] httpResponseBodyBytes2 = Base64.getDecoder().decode(base64HttpResponseBody2);
String httpResponseBody2 =
new String(httpResponseBodyBytes2, StandardCharsets.UTF_8);
Document document2 = Jsoup.parse(httpResponseBody2);
Elements tags = document2.select("select[name='tag'] option");
System.out.println(tags.size());
return null;
});
return null;
});
}
private static String buildAuthHeader() {
String auth = API_KEY + ":";
String encodedAuth = Base64.getEncoder().encodeToString(auth.getBytes());
return "Basic " + encodedAuth;
}
}
const axios = require('axios')
const cheerio = require('cheerio')
const querystring = require('querystring')
axios.post(
'https://api.zyte.com/v1/extract',
{
url: 'https://quotes.toscrape.com/search.aspx',
httpResponseBody: true
},
{
auth: { username: 'YOUR_API_KEY' }
}
).then((response) => {
const httpResponseBody = Buffer.from(
response.data.httpResponseBody,
'base64'
)
const $ = cheerio.load(httpResponseBody)
const viewState = $('[name="__VIEWSTATE"]').get(0).attribs.value
const httpRequestText = querystring.stringify(
{
author: 'Albert Einstein',
tag: '----------',
__VIEWSTATE: viewState
}
)
axios.post(
'https://api.zyte.com/v1/extract',
{
url: 'https://quotes.toscrape.com/filter.aspx',
httpResponseBody: true,
httpRequestMethod: 'POST',
customHttpRequestHeaders: [
{
name: 'Content-Type',
value: 'application/x-www-form-urlencoded'
}
],
httpRequestText
},
{
auth: { username: 'YOUR_API_KEY' }
}
).then((response) => {
const httpResponseBody = Buffer.from(
response.data.httpResponseBody,
'base64'
)
const $ = cheerio.load(httpResponseBody)
console.log($('select[name="tag"] option').length)
})
})
<?php
$client = new GuzzleHttp\Client();
$response_1 = $client->request('POST', 'https://api.zyte.com/v1/extract', [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => 'https://quotes.toscrape.com/search.aspx',
'httpResponseBody' => true,
],
]);
$data = json_decode($response_1->getBody());
$http_response_body = base64_decode($data->httpResponseBody);
$doc = new DOMDocument();
$doc->loadHTML($http_response_body);
$xpath_1 = new DOMXPath($doc);
$view_state = $xpath_1->query('//*[@name="__VIEWSTATE"]/@value')->item(0)->nodeValue;
$http_request_text = http_build_query(
[
'author' => 'Albert Einstein',
'tag' => '----------',
'__VIEWSTATE' => $view_state,
]
);
$response_2 = $client->request('POST', 'https://api.zyte.com/v1/extract', [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => 'https://quotes.toscrape.com/filter.aspx',
'httpResponseBody' => true,
'httpRequestMethod' => 'POST',
'customHttpRequestHeaders' => [
[
'name' => 'Content-Type',
'value' => 'application/x-www-form-urlencoded',
],
],
'httpRequestText' => $http_request_text,
],
]);
$data = json_decode($response_2->getBody());
$http_response_body = base64_decode($data->httpResponseBody);
$doc->loadHTML($http_response_body);
$xpath_2 = new DOMXPath($doc);
$tags = $xpath_2->query('//*[@name="tag"]/option');
echo count($tags).PHP_EOL;
Install form2request, which makes it easier to handle HTML forms in Python.
Then:
from base64 import b64decode
from form2request import form2request
from parsel import Selector
import requests
api_response_1 = requests.post(
"https://api.zyte.com/v1/extract",
auth=("YOUR_API_KEY", ""),
json={
"url": "https://quotes.toscrape.com/search.aspx",
"httpResponseBody": True,
},
)
api_response_1_data = api_response_1.json()
http_response_body_1 = b64decode(api_response_1_data["httpResponseBody"])
selector_1 = Selector(body=http_response_body_1, base_url=api_response_1_data["url"])
form = selector_1.css("form")
request = form2request(form, {"author": "Albert Einstein"}, click=False)
api_response_2 = requests.post(
"https://api.zyte.com/v1/extract",
auth=("YOUR_API_KEY", ""),
json={
"url": request.url,
"httpRequestMethod": request.method,
"customHttpRequestHeaders": [
{"name": k, "value": v} for k, v in request.headers
],
"httpRequestText": request.body.decode(),
"httpResponseBody": True,
},
)
http_response_body_2 = b64decode(api_response_2.json()["httpResponseBody"])
selector_2 = Selector(body=http_response_body_2)
print(len(selector_2.css("select[name='tag'] option")))
Install form2request, which makes it easier to handle HTML forms in Python.
Then:
import asyncio
from base64 import b64decode
from form2request import form2request
from parsel import Selector
from zyte_api import AsyncZyteAPI
async def main():
client = AsyncZyteAPI()
api_response_1 = await client.get(
{
"url": "https://quotes.toscrape.com/search.aspx",
"httpResponseBody": True,
}
)
http_response_body_1 = b64decode(api_response_1["httpResponseBody"])
selector_1 = Selector(body=http_response_body_1, base_url=api_response_1["url"])
form = selector_1.css("form")
request = form2request(form, {"author": "Albert Einstein"}, click=False)
api_response_2 = await client.get(
{
"url": request.url,
"httpRequestMethod": request.method,
"customHttpRequestHeaders": [
{"name": k, "value": v} for k, v in request.headers
],
"httpRequestText": request.body.decode(),
"httpResponseBody": True,
}
)
http_response_body_2 = b64decode(api_response_2["httpResponseBody"])
selector_2 = Selector(body=http_response_body_2)
print(len(selector_2.css("select[name='tag'] option")))
asyncio.run(main())
Install form2request, which makes it easier to handle HTML forms in Scrapy.
Then, use it and let transparent mode take care of the rest:
from form2request import form2request
from scrapy import Spider
class QuotesToScrapeComSpider(Spider):
name = "quotes_toscrape_com"
start_urls = ["https://quotes.toscrape.com/search.aspx"]
def parse(self, response):
form = response.css("form")
request = form2request(form, {"author": "Albert Einstein"}, click=False)
yield request.to_scrapy(callback=self.parse_tags)
def parse_tags(self, response):
print(len(response.css("select[name='tag'] option")))
Output (number of Tag options):
25
Decoding HTML from an HTTP
response body (i.e. from bytes to text)
Note
Install and configure code example requirements and the Zyte CA certificate to run the example below.
Use file to find the media type of a previously-downloaded response based solely on its body (i.e. not following the HTML encoding sniffing algorithm).
file --mime-encoding output.html
Use content-type-parser, html-encoding-sniffer and whatwg-encoding:
const contentTypeParser = require('content-type-parser')
const htmlEncodingSniffer = require('html-encoding-sniffer')
const whatwgEncoding = require('whatwg-encoding')
// …
const httpResponseHeaders = response.data.httpResponseHeaders
let contentTypeCharset
httpResponseHeaders.forEach(function (item) {
if (item.name.toLowerCase() === 'content-type') {
contentTypeCharset = contentTypeParser(item.value).get('charset')
}
})
const httpResponseBody = Buffer.from(response.data.httpResponseBody, 'base64')
const encoding = htmlEncodingSniffer(httpResponseBody, {
transportLayerEncodingLabel: contentTypeCharset
})
const html = whatwgEncoding.decode(httpResponseBody, encoding)
web-poet provides a response wrapper that automatically decodes the response body following an encoding sniffing algorithm similar to the one defined in the HTML standard.
Provided that you have extracted a response with both body and headers, and you have Base64-decoded the response body, you can decode the HTML bytes as follows:
from web_poet import HttpResponse
# …
headers = tuple(
(item['name'], item['value'])
for item in http_response_headers
)
response = HttpResponse(
url='https://example.com',
body=http_response_body,
status=200,
headers=headers,
)
html = response.text
In transparent mode, regular Scrapy requests targeting HTML resources decode them by default. See Zyte API HTTP requests.
Setting arbitrary headers in
HTTP requests
Note
Install and configure code example requirements and the Zyte CA certificate to run the example below.
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.All
};
HttpClient client = new HttpClient(handler);
var apiKey = "YOUR_API_KEY";
var bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(apiKey + ":");
var auth = System.Convert.ToBase64String(bytes);
client.DefaultRequestHeaders.Add("Authorization", "Basic " + auth);
client.DefaultRequestHeaders.Add("Accept-Encoding", "br, gzip, deflate");
var input = new Dictionary<string, object>(){
{"url", "https://httpbin.org/anything"},
{"httpResponseBody", true},
{
"customHttpRequestHeaders",
new List<Dictionary<string, object>>()
{
new Dictionary<string, object>()
{
{"name", "Accept-Language"},
{"value", "fa"}
}
}
}
};
var inputJson = JsonSerializer.Serialize(input);
var content = new StringContent(inputJson, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("https://api.zyte.com/v1/extract", content);
var body = await response.Content.ReadAsByteArrayAsync();
var data = JsonDocument.Parse(body);
var base64HttpResponseBody = data.RootElement.GetProperty("httpResponseBody").ToString();
var httpResponseBody = System.Convert.FromBase64String(base64HttpResponseBody);
var responseData = JsonDocument.Parse(httpResponseBody);
var headerEnumerator = responseData.RootElement.GetProperty("headers").EnumerateObject();
var headers = new Dictionary<string, string>();
while (headerEnumerator.MoveNext())
{
headers.Add(
headerEnumerator.Current.Name.ToString(),
headerEnumerator.Current.Value.ToString()
);
}
{"url": "https://httpbin.org/anything", "httpResponseBody": true, "customHttpRequestHeaders": [{"name": "Accept-Language", "value": "fa"}]}
zyte-api input.jsonl \
| jq --raw-output .httpResponseBody \
| base64 --decode \
| jq .headers
{
"url": "https://httpbin.org/anything",
"httpResponseBody": true,
"customHttpRequestHeaders": [
{
"name": "Accept-Language",
"value": "fa"
}
]
}
curl \
--user YOUR_API_KEY: \
--header 'Content-Type: application/json' \
--data @input.json \
--compressed \
https://api.zyte.com/v1/extract \
| jq --raw-output .httpResponseBody \
| base64 --decode \
| jq .headers
import com.google.common.collect.ImmutableMap;
import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Collections;
import java.util.Map;
import org.apache.hc.client5.http.classic.methods.HttpPost;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.ContentType;
import org.apache.hc.core5.http.HttpEntity;
import org.apache.hc.core5.http.HttpHeaders;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.io.entity.EntityUtils;
import org.apache.hc.core5.http.io.entity.StringEntity;
class Example {
private static final String API_KEY = "YOUR_API_KEY";
public static void main(final String[] args)
throws InterruptedException, IOException, ParseException {
Map<String, Object> customHttpRequestHeader =
ImmutableMap.of("name", "Accept-Language", "value", "fa");
Map<String, Object> parameters =
ImmutableMap.of(
"url",
"https://httpbin.org/anything",
"httpResponseBody",
true,
"customHttpRequestHeaders",
Collections.singletonList(customHttpRequestHeader));
String requestBody = new Gson().toJson(parameters);
HttpPost request = new HttpPost("https://api.zyte.com/v1/extract");
request.setHeader(HttpHeaders.CONTENT_TYPE, ContentType.APPLICATION_JSON);
request.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
request.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
request.setEntity(new StringEntity(requestBody));
CloseableHttpClient client = HttpClients.createDefault();
client.execute(
request,
response -> {
HttpEntity entity = response.getEntity();
String apiResponse = EntityUtils.toString(entity, StandardCharsets.UTF_8);
JsonObject jsonObject = JsonParser.parseString(apiResponse).getAsJsonObject();
String base64HttpResponseBody = jsonObject.get("httpResponseBody").getAsString();
byte[] httpResponseBodyBytes = Base64.getDecoder().decode(base64HttpResponseBody);
String httpResponseBody = new String(httpResponseBodyBytes, StandardCharsets.UTF_8);
JsonObject data = JsonParser.parseString(httpResponseBody).getAsJsonObject();
JsonObject headers = data.get("headers").getAsJsonObject();
Gson gson = new GsonBuilder().setPrettyPrinting().create();
System.out.println(gson.toJson(headers));
return null;
});
}
private static String buildAuthHeader() {
String auth = API_KEY + ":";
String encodedAuth = Base64.getEncoder().encodeToString(auth.getBytes());
return "Basic " + encodedAuth;
}
}
const axios = require('axios')
axios.post(
'https://api.zyte.com/v1/extract',
{
url: 'https://httpbin.org/anything',
httpResponseBody: true,
customHttpRequestHeaders: [
{
name: 'Accept-Language',
value: 'fa'
}
]
},
{
auth: { username: 'YOUR_API_KEY' }
}
).then((response) => {
const httpResponseBody = Buffer.from(
response.data.httpResponseBody,
'base64'
)
const headers = JSON.parse(httpResponseBody).headers
})
<?php
$client = new GuzzleHttp\Client();
$response = $client->request('POST', 'https://api.zyte.com/v1/extract', [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => 'https://httpbin.org/anything',
'httpResponseBody' => true,
'customHttpRequestHeaders' => [
[
'name' => 'Accept-Language',
'value' => 'fa',
],
],
],
]);
$api = json_decode($response->getBody());
$http_response_body = base64_decode($api->httpResponseBody);
$data = json_decode($http_response_body);
$headers = $data->headers;
With the proxy mode, the request headers from your requests are used automatically.
curl \
--proxy api.zyte.com:8011 \
--proxy-user YOUR_API_KEY: \
--compressed \
-H "Accept-Language: fa" \
https://httpbin.org/anything \
| jq .headers
import json
from base64 import b64decode
import requests
api_response = requests.post(
"https://api.zyte.com/v1/extract",
auth=("YOUR_API_KEY", ""),
json={
"url": "https://httpbin.org/anything",
"httpResponseBody": True,
"customHttpRequestHeaders": [
{
"name": "Accept-Language",
"value": "fa",
},
],
},
)
http_response_body = b64decode(api_response.json()["httpResponseBody"])
headers = json.loads(http_response_body)["headers"]
import asyncio
import json
from base64 import b64decode
from zyte_api import AsyncZyteAPI
async def main():
client = AsyncZyteAPI()
api_response = await client.get(
{
"url": "https://httpbin.org/anything",
"httpResponseBody": True,
"customHttpRequestHeaders": [
{
"name": "Accept-Language",
"value": "fa",
},
],
}
)
http_response_body: bytes = b64decode(api_response["httpResponseBody"])
headers = json.loads(http_response_body)["headers"]
print(json.dumps(headers, indent=2))
asyncio.run(main())
import json
from scrapy import Request, Spider
class HTTPBinOrgSpider(Spider):
name = "httpbin_org"
def start_requests(self):
yield Request(
"https://httpbin.org/anything",
headers={"Accept-Language": "fa"},
)
def parse(self, response):
headers = json.loads(response.text)["headers"]
Output (first 5 lines):
{
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "fa",
"Host": "httpbin.org",
Forcing data center or residential traffic
Note
Install and configure code example requirements and the Zyte CA certificate to run the example below.
using System;
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
using System.Xml.XPath;
using HtmlAgilityPack;
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.All
};
HttpClient client = new HttpClient(handler);
var apiKey = "YOUR_API_KEY";
var bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(apiKey + ":");
var auth = System.Convert.ToBase64String(bytes);
client.DefaultRequestHeaders.Add("Authorization", "Basic " + auth);
client.DefaultRequestHeaders.Add("Accept-Encoding", "br, gzip, deflate");
string[] ipTypes = { "datacenter", "residential" };
for (int i = 0; i < ipTypes.Length; i++)
{
var input = new Dictionary<string, object>(){
{"url", "https://www.whatismyisp.com/"},
{"httpResponseBody", true},
{"ipType", ipTypes[i]}
};
var inputJson = JsonSerializer.Serialize(input);
var content = new StringContent(inputJson, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("https://api.zyte.com/v1/extract", content);
var body = await response.Content.ReadAsByteArrayAsync();
var data = JsonDocument.Parse(body);
var base64HttpResponseBody = data.RootElement.GetProperty("httpResponseBody").ToString();
var httpResponseBodyBytes = System.Convert.FromBase64String(base64HttpResponseBody);
var httpResponseBody = System.Text.Encoding.UTF8.GetString(httpResponseBodyBytes);
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(httpResponseBody);
var navigator = htmlDocument.CreateNavigator();
var nodeIterator = (XPathNodeIterator)navigator.Evaluate("//h1/span/text()");
nodeIterator.MoveNext();
var isp = nodeIterator.Current.ToString();
Console.WriteLine(isp);
}
{"url": "https://www.whatismyisp.com/", "httpResponseBody": true, "ipType": "datacenter"}
{"url": "https://www.whatismyisp.com/", "httpResponseBody": true, "ipType": "residential"}
zyte-api input.jsonl 2> /dev/null \
| xargs -d\\n -n 1 \
bash -c "
jq --raw-output .httpResponseBody <<< \"\$0\" \
| base64 --decode \
| xmllint --html --xpath 'string(//h1/span/text())' --noblanks - 2> /dev/null
"
{"url": "https://www.whatismyisp.com/", "httpResponseBody": true, "ipType": "datacenter"}
{"url": "https://www.whatismyisp.com/", "httpResponseBody": true, "ipType": "residential"}
cat input.jsonl \
| xargs -P 2 -d\\n -n 1 \
bash -c "
curl \
--user YOUR_API_KEY: \
--header 'Content-Type: application/json' \
--data \"\$0\" \
--compressed \
https://api.zyte.com/v1/extract \
2> /dev/null \
| jq --raw-output .httpResponseBody \
| base64 --decode \
| xmllint --html --xpath 'string(//h1/span/text())' --noblanks - 2> /dev/null
"
import com.google.common.collect.ImmutableMap;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Map;
import org.apache.hc.client5.http.classic.methods.HttpPost;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.ContentType;
import org.apache.hc.core5.http.HttpEntity;
import org.apache.hc.core5.http.HttpHeaders;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.io.entity.EntityUtils;
import org.apache.hc.core5.http.io.entity.StringEntity;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
class Example {
private static final String API_KEY = "YOUR_API_KEY";
public static void main(final String[] args)
throws InterruptedException, IOException, ParseException {
String[] ipTypes = {"datacenter", "residential"};
for (String ipType : ipTypes) {
Map<String, Object> parameters =
ImmutableMap.of(
"url", "https://www.whatismyisp.com/", "httpResponseBody", true, "ipType", ipType);
String requestBody = new Gson().toJson(parameters);
HttpPost request = new HttpPost("https://api.zyte.com/v1/extract");
request.setHeader(HttpHeaders.CONTENT_TYPE, ContentType.APPLICATION_JSON);
request.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
request.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
request.setEntity(new StringEntity(requestBody));
CloseableHttpClient client = HttpClients.createDefault();
client.execute(
request,
response -> {
HttpEntity entity = response.getEntity();
String apiResponse = EntityUtils.toString(entity, StandardCharsets.UTF_8);
JsonObject jsonObject = JsonParser.parseString(apiResponse).getAsJsonObject();
String base64HttpResponseBody = jsonObject.get("httpResponseBody").getAsString();
byte[] httpResponseBodyBytes = Base64.getDecoder().decode(base64HttpResponseBody);
String httpResponseBody = new String(httpResponseBodyBytes, StandardCharsets.UTF_8);
Document document = Jsoup.parse(httpResponseBody);
String logout = document.select("h1 > span:first-of-type").text();
System.out.println(logout);
return null;
});
}
}
private static String buildAuthHeader() {
String auth = API_KEY + ":";
String encodedAuth = Base64.getEncoder().encodeToString(auth.getBytes());
return "Basic " + encodedAuth;
}
}
const axios = require('axios')
const cheerio = require('cheerio')
const ipTypes = ['datacenter', 'residential']
for (const ipType of ipTypes) {
axios.post(
'https://api.zyte.com/v1/extract',
{
url: 'https://www.whatismyisp.com/',
httpResponseBody: true,
ipType
},
{
auth: { username: 'YOUR_API_KEY' }
}
).then((response) => {
const httpResponseBody = Buffer.from(
response.data.httpResponseBody,
'base64'
)
const $ = cheerio.load(httpResponseBody)
const logout = $('h1 > span:first-of-type').text()
console.log(logout)
})
}
<?php
error_reporting(E_ERROR | E_PARSE);
$client = new GuzzleHttp\Client();
$ip_types = ['datacenter', 'residential'];
foreach ($ip_types as &$ip_type) {
$response = $client->request('POST', 'https://api.zyte.com/v1/extract', [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => 'https://www.whatismyisp.com/',
'httpResponseBody' => true,
'ipType' => $ip_type,
],
]);
$data = json_decode($response->getBody());
$http_response_body = base64_decode($data->httpResponseBody);
$doc = new DOMDocument();
$doc->loadHTML($http_response_body);
$xpath = new DOMXPath($doc);
$logout = $xpath->query('//h1/span/text()')->item(0)->nodeValue;
echo $logout.PHP_EOL;
}
With the proxy mode, use the Zyte-IPType header.
for ip_type in datacenter residential
do
curl \
--proxy api.zyte.com:8011 \
--proxy-user YOUR_API_KEY: \
--header "Zyte-IPType: $ip_type" \
--compressed \
https://www.whatismyisp.com/ \
2> /dev/null \
| xmllint --html --xpath 'string(//h1/span/text())' --noblanks - 2> /dev/null
done
from base64 import b64decode
import requests
from parsel import Selector
for ip_type in ("datacenter", "residential"):
api_response = requests.post(
"https://api.zyte.com/v1/extract",
auth=("YOUR_API_KEY", ""),
json={
"url": "https://www.whatismyisp.com/",
"httpResponseBody": True,
"ipType": ip_type,
},
)
http_response_body_bytes = b64decode(api_response.json()["httpResponseBody"])
http_response_body = http_response_body_bytes.decode()
logout = Selector(http_response_body).css("h1 > span::text").get()
print(logout)
import asyncio
from base64 import b64decode
from parsel import Selector
from zyte_api import AsyncZyteAPI
async def main():
client = AsyncZyteAPI()
for ip_type in ("datacenter", "residential"):
api_response = await client.get(
{
"url": "https://www.whatismyisp.com/",
"httpResponseBody": True,
"ipType": ip_type,
},
)
http_response_body_bytes = b64decode(api_response["httpResponseBody"])
http_response_body = http_response_body_bytes.decode()
logout = Selector(http_response_body).css("h1 > span::text").get()
print(logout)
asyncio.run(main())
from scrapy import Request, Spider
class WhatIsMyIspComSpider(Spider):
name = "whatismyisp_com"
def start_requests(self):
for ip_type in ("datacenter", "residential"):
yield Request(
"https://www.whatismyisp.com/",
meta={
"zyte_api_automap": {
"ipType": ip_type,
},
},
)
def parse(self, response):
print(response.css("h1 > span::text").get())
Output:
[A web hosting company]
[An Internet service provider]
Disabling JavaScript in a browser
request
Note
Install and configure code example requirements and the Zyte CA certificate to run the example below.
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
using System.Xml.XPath;
using HtmlAgilityPack;
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.All
};
HttpClient client = new HttpClient(handler);
var apiKey = "YOUR_API_KEY";
var bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(apiKey + ":");
var auth = System.Convert.ToBase64String(bytes);
client.DefaultRequestHeaders.Add("Authorization", "Basic " + auth);
client.DefaultRequestHeaders.Add("Accept-Encoding", "br, gzip, deflate");
var input = new Dictionary<string, object>(){
{"url", "https://www.whatismybrowser.com/detect/is-javascript-enabled"},
{"browserHtml", true},
{"javascript", false}
};
var inputJson = JsonSerializer.Serialize(input);
var content = new StringContent(inputJson, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("https://api.zyte.com/v1/extract", content);
var body = await response.Content.ReadAsByteArrayAsync();
var data = JsonDocument.Parse(body);
var browserHtml = data.RootElement.GetProperty("browserHtml").ToString();
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(browserHtml);
var navigator = htmlDocument.CreateNavigator();
var nodeIterator = (XPathNodeIterator)navigator.Evaluate("//*[@id=\"detected_value\"]/text()");
nodeIterator.MoveNext();
var isJavaScriptEnabled = nodeIterator.Current.ToString();
{"url": "https://www.whatismybrowser.com/detect/is-javascript-enabled", "browserHtml": true, "javascript": false}
zyte-api input.jsonl \
| jq --raw-output .browserHtml \
| xmllint --html --xpath '//*[@id="detected_value"]/text()' - 2> /dev/null
{
"url": "https://www.whatismybrowser.com/detect/is-javascript-enabled",
"browserHtml": true,
"javascript": false
}
curl \
--user YOUR_API_KEY: \
--header 'Content-Type: application/json' \
--data @input.json \
--compressed \
https://api.zyte.com/v1/extract \
| jq --raw-output .browserHtml \
| xmllint --html --xpath '//*[@id="detected_value"]/text()' - 2> /dev/null
import com.google.common.collect.ImmutableMap;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Map;
import org.apache.hc.client5.http.classic.methods.HttpPost;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.ContentType;
import org.apache.hc.core5.http.HttpEntity;
import org.apache.hc.core5.http.HttpHeaders;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.io.entity.EntityUtils;
import org.apache.hc.core5.http.io.entity.StringEntity;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
class Example {
private static final String API_KEY = "YOUR_API_KEY";
public static void main(final String[] args)
throws InterruptedException, IOException, ParseException {
Map<String, Object> parameters =
ImmutableMap.of(
"url",
"https://www.whatismybrowser.com/detect/is-javascript-enabled",
"browserHtml",
true,
"javascript",
false);
String requestBody = new Gson().toJson(parameters);
HttpPost request = new HttpPost("https://api.zyte.com/v1/extract");
request.setHeader(HttpHeaders.CONTENT_TYPE, ContentType.APPLICATION_JSON);
request.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
request.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
request.setEntity(new StringEntity(requestBody));
CloseableHttpClient client = HttpClients.createDefault();
client.execute(
request,
response -> {
HttpEntity entity = response.getEntity();
String apiResponse = EntityUtils.toString(entity, StandardCharsets.UTF_8);
JsonObject jsonObject = JsonParser.parseString(apiResponse).getAsJsonObject();
String browserHtml = jsonObject.get("browserHtml").getAsString();
Document document = Jsoup.parse(browserHtml);
String isJavaScriptEnabled = document.select("#detected_value").text();
System.out.println(isJavaScriptEnabled);
return null;
});
}
private static String buildAuthHeader() {
String auth = API_KEY + ":";
String encodedAuth = Base64.getEncoder().encodeToString(auth.getBytes());
return "Basic " + encodedAuth;
}
}
const axios = require('axios')
const cheerio = require('cheerio')
axios.post(
'https://api.zyte.com/v1/extract',
{
url: 'https://www.whatismybrowser.com/detect/is-javascript-enabled',
browserHtml: true,
javascript: false
},
{
auth: { username: 'YOUR_API_KEY' }
}
).then((response) => {
const $ = cheerio.load(response.data.browserHtml)
const isJavaScriptEnabled = $('#detected_value').text()
})
<?php
$client = new GuzzleHttp\Client();
$response = $client->request('POST', 'https://api.zyte.com/v1/extract', [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => 'https://www.whatismybrowser.com/detect/is-javascript-enabled',
'browserHtml' => true,
'javascript' => false,
],
]);
$api = json_decode($response->getBody());
$doc = new DOMDocument();
$doc->loadHTML($api->browserHtml);
$xpath = new DOMXPath($doc);
$is_javascript_enabled = $xpath->query("//*[@id='detected_value']")->item(0)->textContent;
import requests
from parsel import Selector
api_response = requests.post(
"https://api.zyte.com/v1/extract",
auth=("YOUR_API_KEY", ""),
json={
"url": "https://www.whatismybrowser.com/detect/is-javascript-enabled",
"browserHtml": True,
"javascript": False,
},
)
browser_html = api_response.json()["browserHtml"]
selector = Selector(browser_html)
is_javascript_enabled: str = selector.css("#detected_value::text").get()
import asyncio
from parsel import Selector
from zyte_api import AsyncZyteAPI
async def main():
client = AsyncZyteAPI()
api_response = await client.get(
{
"url": "https://www.whatismybrowser.com/detect/is-javascript-enabled",
"browserHtml": True,
"javascript": False,
}
)
browser_html = api_response["browserHtml"]
selector = Selector(browser_html)
is_javascript_enabled = selector.css("#detected_value::text").get()
print(is_javascript_enabled)
asyncio.run(main())
from scrapy import Request, Spider
class WhatIsMyBrowserComSpider(Spider):
name = "whatismybrowser_com"
def start_requests(self):
yield Request(
"https://www.whatismybrowser.com/detect/is-javascript-enabled",
meta={
"zyte_api_automap": {
"browserHtml": True,
"javascript": False,
},
},
)
def parse(self, response):
is_javascript_enabled: str = response.css("#detected_value::text").get()
Output:
No
Appending arbitrary metadata to a request
Note
Install and configure code example requirements and the Zyte CA certificate to run the example below.
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
var inputData = new List<List<object>>()
{
new List<object>(){"https://toscrape.com", 1},
new List<object>(){"https://books.toscrape.com", 2},
new List<object>(){"https://quotes.toscrape.com", 3},
};
var output = new List<HttpResponseMessage>();
var handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.All,
MaxConnectionsPerServer = 15
};
var client = new HttpClient(handler);
var apiKey = "YOUR_API_KEY";
var bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(apiKey + ":");
var auth = System.Convert.ToBase64String(bytes);
client.DefaultRequestHeaders.Add("Authorization", "Basic " + auth);
client.DefaultRequestHeaders.Add("Accept-Encoding", "br, gzip, deflate");
var responseTasks = new List<Task<HttpResponseMessage>>();
foreach (var entry in inputData)
{
var input = new Dictionary<string, object>(){
{"url", entry[0]},
{"browserHtml", true},
{"echoData", entry[1]}
};
var inputJson = JsonSerializer.Serialize(input);
var content = new StringContent(inputJson, Encoding.UTF8, "application/json");
var responseTask = client.PostAsync("https://api.zyte.com/v1/extract", content);
responseTasks.Add(responseTask);
}
while (responseTasks.Any())
{
var responseTask = await Task.WhenAny(responseTasks);
responseTasks.Remove(responseTask);
var response = await responseTask;
output.Add(response);
}
{"url": "https://toscrape.com", "browserHtml": true, "echoData": 1}
{"url": "https://books.toscrape.com", "browserHtml": true, "echoData": 2}
{"url": "https://quotes.toscrape.com", "browserHtml": true, "echoData": 3}
zyte-api --n-conn 15 input.jsonl -o output.jsonl
{"url": "https://toscrape.com", "browserHtml": true, "echoData": 1}
{"url": "https://books.toscrape.com", "browserHtml": true, "echoData": 2}
{"url": "https://quotes.toscrape.com", "browserHtml": true, "echoData": 3}
cat input.jsonl \
| xargs -P 15 -d\\n -n 1 \
bash -c "
curl \
--user $ZYTE_API_KEY: \
--header 'Content-Type: application/json' \
--data \"\$0\" \
--compressed \
https://api.zyte.com/v1/extract \
| jq .echoData \
| awk '{print \$1}' \
>> output.jsonl
"
import com.google.common.collect.ImmutableMap;
import com.google.gson.Gson;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Base64;
import java.util.List;
import java.util.Map;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;
import org.apache.hc.client5.http.async.methods.SimpleHttpRequest;
import org.apache.hc.client5.http.async.methods.SimpleHttpResponse;
import org.apache.hc.client5.http.impl.async.CloseableHttpAsyncClient;
import org.apache.hc.client5.http.impl.async.HttpAsyncClients;
import org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManager;
import org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManagerBuilder;
import org.apache.hc.client5.http.ssl.ClientTlsStrategyBuilder;
import org.apache.hc.core5.concurrent.FutureCallback;
import org.apache.hc.core5.http.ContentType;
import org.apache.hc.core5.http.HttpHeaders;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.nio.ssl.TlsStrategy;
class Example {
private static final String API_KEY = "YOUR_API_KEY";
public static void main(final String[] args)
throws ExecutionException, InterruptedException, IOException, ParseException {
Object[][] input = {
{"https://toscrape.com", 1},
{"https://bookstoscrape.com", 2},
{"https://quotes.toscrape.com", 3}
};
List<Future> futures = new ArrayList<Future>();
List<String> output = new ArrayList<String>();
int concurrency = 15;
// https://issues.apache.org/jira/browse/HTTPCLIENT-2219
final TlsStrategy tlsStrategy = ClientTlsStrategyBuilder.create().useSystemProperties().build();
PoolingAsyncClientConnectionManager connectionManager =
PoolingAsyncClientConnectionManagerBuilder.create().setTlsStrategy(tlsStrategy).build();
connectionManager.setMaxTotal(concurrency);
connectionManager.setDefaultMaxPerRoute(concurrency);
CloseableHttpAsyncClient client =
HttpAsyncClients.custom().setConnectionManager(connectionManager).build();
try {
client.start();
for (int i = 0; i < input.length; i++) {
Map<String, Object> parameters =
ImmutableMap.of("url", input[i][0], "browserHtml", true, "echoData", input[i][1]);
String requestBody = new Gson().toJson(parameters);
SimpleHttpRequest request =
new SimpleHttpRequest("POST", "https://api.zyte.com/v1/extract");
request.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
request.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
request.setBody(requestBody, ContentType.APPLICATION_JSON);
final Future<SimpleHttpResponse> future =
client.execute(
request,
new FutureCallback<SimpleHttpResponse>() {
public void completed(final SimpleHttpResponse response) {
String apiResponse = response.getBodyText();
output.add(apiResponse);
}
public void failed(final Exception ex) {}
public void cancelled() {}
});
futures.add(future);
}
for (int i = 0; i < futures.size(); i++) {
futures.get(i).get();
}
} finally {
client.close();
}
}
private static String buildAuthHeader() {
String auth = API_KEY + ":";
String encodedAuth = Base64.getEncoder().encodeToString(auth.getBytes());
return "Basic " + encodedAuth;
}
}
const { ConcurrencyManager } = require('axios-concurrency')
const axios = require('axios')
const urls = [
['https://toscrape.com', 1],
['https://books.toscrape.com', 2],
['https://quotes.toscrape.com', 3]
]
const output = []
const client = axios.create()
ConcurrencyManager(client, 15)
Promise.all(
urls.map((input) =>
client.post(
'https://api.zyte.com/v1/extract',
{ url: input[0], browserHtml: true, echoData: input[1] },
{
auth: { username: 'YOUR_API_KEY' }
}
).then((response) => output.push(response.data))
)
)
<?php
$input = [
['https://toscrape.com', 1],
['https://books.toscrape.com', 2],
['https://quotes.toscrape.com', 3],
];
$output = [];
$promises = [];
$client = new GuzzleHttp\Client();
foreach ($input as $url_and_index) {
$options = [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => $url_and_index[0],
'browserHtml' => true,
'echoData' => $url_and_index[1],
],
];
$request = new \GuzzleHttp\Psr7\Request('POST', 'https://api.zyte.com/v1/extract');
global $promises;
$promises[] = $client->sendAsync($request, $options)->then(function ($response) {
global $output;
$output[] = json_decode($response->getBody());
});
}
foreach ($promises as $promise) {
$promise->wait();
}
With the proxy mode you cannot set request metadata.
import asyncio
import aiohttp
input_data = [
("https://toscrape.com", 1),
("https://books.toscrape.com", 2),
("https://quotes.toscrape.com", 3),
]
output = []
async def extract(client, url, index):
response = await client.post(
"https://api.zyte.com/v1/extract",
json={"url": url, "browserHtml": True, "echoData": index},
auth=aiohttp.BasicAuth("YOUR_API_KEY"),
)
output.append(await response.json())
async def main():
connector = aiohttp.TCPConnector(limit_per_host=15)
async with aiohttp.ClientSession(connector=connector) as client:
await asyncio.gather(
*[extract(client, url, index) for url, index in input_data]
)
asyncio.run(main())
import asyncio
import json
from zyte_api import AsyncZyteAPI
input_data = [
("https://toscrape.com", 1),
("https://books.toscrape.com", 2),
("https://quotes.toscrape.com", 3),
]
async def main():
client = AsyncZyteAPI(n_conn=15)
queries = [
{"url": url, "browserHtml": True, "echoData": index}
for url, index in input_data
]
async with client.session() as session:
for future in session.iter(queries):
response = await future
print(json.dumps(response))
asyncio.run(main())
from scrapy import Request, Spider
input_data = [
("https://toscrape.com", 1),
("https://books.toscrape.com", 2),
("https://quotes.toscrape.com", 3),
]
class ToScrapeSpider(Spider):
name = "toscrape_com"
custom_settings = {
"CONCURRENT_REQUESTS": 15,
"CONCURRENT_REQUESTS_PER_DOMAIN": 15,
}
def start_requests(self):
for url, index in input_data:
yield Request(
url,
meta={
"zyte_api_automap": {
"browserHtml": True,
"echoData": index,
},
},
)
def parse(self, response):
yield {
"index": response.raw_api_response["echoData"],
"html": response.text,
}
Alternatively, you can use Scrapy’s Request.cb_kwargs
directly for a
similar purpose:
def start_requests(self):
for url, index in input_data:
yield Request(
url,
cb_kwargs={"index": index},
meta={
"zyte_api_automap": {
"browserHtml": True,
},
},
)
def parse(self, response, index):
yield {
"index": index,
"html": response.text,
}
Output:
{"url": "https://quotes.toscrape.com/", "statusCode": 200, "browserHtml": "<!DOCTYPE html><html lang=\"en\"><head>\n\t<meta charset=\"UTF-8\">\n\t<title>Quotes to Scrape</title>\n <link rel=\"stylesheet\" href=\"/static/bootstrap.min.css\">\n <link rel=\"stylesheet\" href=\"/static/main.css\">\n</head>\n<body>\n <div class=\"container\">\n <div class=\"row header-box\">\n <div class=\"col-md-8\">\n <h1>\n <a href=\"/\" style=\"text-decoration: none\">Quotes to Scrape</a>\n </h1>\n </div>\n <div class=\"col-md-4\">\n <p>\n \n <a href=\"/login\">Login</a>\n \n </p>\n </div>\n </div>\n \n\n<div class=\"row\">\n <div class=\"col-md-8\">\n\n <div class=\"quote\" itemscope=\"\" itemtype=\"http://schema.org/CreativeWork\">\n <span class=\"text\" itemprop=\"text\">“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”</span>\n <span>by <small class=\"author\" itemprop=\"author\">Albert Einstein</small>\n <a href=\"/author/Albert-Einstein\">(about)</a>\n </span>\n <div class=\"tags\">\n Tags:\n <meta class=\"keywords\" itemprop=\"keywords\" content=\"change,deep-thoughts,thinking,world\"> \n \n <a class=\"tag\" href=\"/tag/change/page/1/\">change</a>\n \n <a class=\"tag\" href=\"/tag/deep-thoughts/page/1/\">deep-thoughts</a>\n \n <a class=\"tag\" href=\"/tag/thinking/page/1/\">thinking</a>\n \n <a class=\"tag\" href=\"/tag/world/page/1/\">world</a>\n \n </div>\n </div>\n\n <div class=\"quote\" itemscope=\"\" itemtype=\"http://schema.org/CreativeWork\">\n <span class=\"text\" itemprop=\"text\">“It is our choices, Harry, that show what we truly are, far more than our abilities.”</span>\n <span>by <small class=\"author\" itemprop=\"author\">J.K. Rowling</small>\n <a href=\"/author/J-K-Rowling\">(about)</a>\n </span>\n <div class=\"tags\">\n Tags:\n <meta class=\"keywords\" itemprop=\"keywords\" content=\"abilities,choices\"> \n \n <a class=\"tag\" href=\"/tag/abilities/page/1/\">abilities</a>\n \n <a class=\"tag\" href=\"/tag/choices/page/1/\">choices</a>\n \n </div>\n </div>\n\n <div class=\"quote\" itemscope=\"\" itemtype=\"http://schema.org/CreativeWork\">\n <span class=\"text\" itemprop=\"text\">“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”</span>\n <span>by <small class=\"author\" itemprop=\"author\">Albert Einstein</small>\n <a href=\"/author/Albert-Einstein\">(about)</a>\n </span>\n <div class=\"tags\">\n Tags:\n <meta class=\"keywords\" itemprop=\"keywords\" content=\"inspirational,life,live,miracle,miracles\"> \n \n <a class=\"tag\" href=\"/tag/inspirational/page/1/\">inspirational</a>\n \n <a class=\"tag\" href=\"/tag/life/page/1/\">life</a>\n \n <a class=\"tag\" href=\"/tag/live/page/1/\">live</a>\n \n <a class=\"tag\" href=\"/tag/miracle/page/1/\">miracle</a>\n \n <a class=\"tag\" href=\"/tag/miracles/page/1/\">miracles</a>\n \n </div>\n </div>\n\n <div class=\"quote\" itemscope=\"\" itemtype=\"http://schema.org/CreativeWork\">\n <span class=\"text\" itemprop=\"text\">“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”</span>\n <span>by <small class=\"author\" itemprop=\"author\">Jane Austen</small>\n <a href=\"/author/Jane-Austen\">(about)</a>\n </span>\n <div class=\"tags\">\n Tags:\n <meta class=\"keywords\" itemprop=\"keywords\" content=\"aliteracy,books,classic,humor\"> \n \n <a class=\"tag\" href=\"/tag/aliteracy/page/1/\">aliteracy</a>\n \n <a class=\"tag\" href=\"/tag/books/page/1/\">books</a>\n \n <a class=\"tag\" href=\"/tag/classic/page/1/\">classic</a>\n \n <a class=\"tag\" href=\"/tag/humor/page/1/\">humor</a>\n \n </div>\n </div>\n\n <div class=\"quote\" itemscope=\"\" itemtype=\"http://schema.org/CreativeWork\">\n <span class=\"text\" itemprop=\"text\">“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”</span>\n <span>by <small class=\"author\" itemprop=\"author\">Marilyn Monroe</small>\n <a href=\"/author/Marilyn-Monroe\">(about)</a>\n </span>\n <div class=\"tags\">\n Tags:\n <meta class=\"keywords\" itemprop=\"keywords\" content=\"be-yourself,inspirational\"> \n \n <a class=\"tag\" href=\"/tag/be-yourself/page/1/\">be-yourself</a>\n \n <a class=\"tag\" href=\"/tag/inspirational/page/1/\">inspirational</a>\n \n </div>\n </div>\n\n <div class=\"quote\" itemscope=\"\" itemtype=\"http://schema.org/CreativeWork\">\n <span class=\"text\" itemprop=\"text\">“Try not to become a man of success. Rather become a man of value.”</span>\n <span>by <small class=\"author\" itemprop=\"author\">Albert Einstein</small>\n <a href=\"/author/Albert-Einstein\">(about)</a>\n </span>\n <div class=\"tags\">\n Tags:\n <meta class=\"keywords\" itemprop=\"keywords\" content=\"adulthood,success,value\"> \n \n <a class=\"tag\" href=\"/tag/adulthood/page/1/\">adulthood</a>\n \n <a class=\"tag\" href=\"/tag/success/page/1/\">success</a>\n \n <a class=\"tag\" href=\"/tag/value/page/1/\">value</a>\n \n </div>\n </div>\n\n <div class=\"quote\" itemscope=\"\" itemtype=\"http://schema.org/CreativeWork\">\n <span class=\"text\" itemprop=\"text\">“It is better to be hated for what you are than to be loved for what you are not.”</span>\n <span>by <small class=\"author\" itemprop=\"author\">André Gide</small>\n <a href=\"/author/Andre-Gide\">(about)</a>\n </span>\n <div class=\"tags\">\n Tags:\n <meta class=\"keywords\" itemprop=\"keywords\" content=\"life,love\"> \n \n <a class=\"tag\" href=\"/tag/life/page/1/\">life</a>\n \n <a class=\"tag\" href=\"/tag/love/page/1/\">love</a>\n \n </div>\n </div>\n\n <div class=\"quote\" itemscope=\"\" itemtype=\"http://schema.org/CreativeWork\">\n <span class=\"text\" itemprop=\"text\">“I have not failed. I've just found 10,000 ways that won't work.”</span>\n <span>by <small class=\"author\" itemprop=\"author\">Thomas A. Edison</small>\n <a href=\"/author/Thomas-A-Edison\">(about)</a>\n </span>\n <div class=\"tags\">\n Tags:\n <meta class=\"keywords\" itemprop=\"keywords\" content=\"edison,failure,inspirational,paraphrased\"> \n \n <a class=\"tag\" href=\"/tag/edison/page/1/\">edison</a>\n \n <a class=\"tag\" href=\"/tag/failure/page/1/\">failure</a>\n \n <a class=\"tag\" href=\"/tag/inspirational/page/1/\">inspirational</a>\n \n <a class=\"tag\" href=\"/tag/paraphrased/page/1/\">paraphrased</a>\n \n </div>\n </div>\n\n <div class=\"quote\" itemscope=\"\" itemtype=\"http://schema.org/CreativeWork\">\n <span class=\"text\" itemprop=\"text\">“A woman is like a tea bag; you never know how strong it is until it's in hot water.”</span>\n <span>by <small class=\"author\" itemprop=\"author\">Eleanor Roosevelt</small>\n <a href=\"/author/Eleanor-Roosevelt\">(about)</a>\n </span>\n <div class=\"tags\">\n Tags:\n <meta class=\"keywords\" itemprop=\"keywords\" content=\"misattributed-eleanor-roosevelt\"> \n \n <a class=\"tag\" href=\"/tag/misattributed-eleanor-roosevelt/page/1/\">misattributed-eleanor-roosevelt</a>\n \n </div>\n </div>\n\n <div class=\"quote\" itemscope=\"\" itemtype=\"http://schema.org/CreativeWork\">\n <span class=\"text\" itemprop=\"text\">“A day without sunshine is like, you know, night.”</span>\n <span>by <small class=\"author\" itemprop=\"author\">Steve Martin</small>\n <a href=\"/author/Steve-Martin\">(about)</a>\n </span>\n <div class=\"tags\">\n Tags:\n <meta class=\"keywords\" itemprop=\"keywords\" content=\"humor,obvious,simile\"> \n \n <a class=\"tag\" href=\"/tag/humor/page/1/\">humor</a>\n \n <a class=\"tag\" href=\"/tag/obvious/page/1/\">obvious</a>\n \n <a class=\"tag\" href=\"/tag/simile/page/1/\">simile</a>\n \n </div>\n </div>\n\n <nav>\n <ul class=\"pager\">\n \n \n <li class=\"next\">\n <a href=\"/page/2/\">Next <span aria-hidden=\"true\">→</span></a>\n </li>\n \n </ul>\n </nav>\n </div>\n <div class=\"col-md-4 tags-box\">\n \n <h2>Top Ten tags</h2>\n \n <span class=\"tag-item\">\n <a class=\"tag\" style=\"font-size: 28px\" href=\"/tag/love/\">love</a>\n </span>\n \n <span class=\"tag-item\">\n <a class=\"tag\" style=\"font-size: 26px\" href=\"/tag/inspirational/\">inspirational</a>\n </span>\n \n <span class=\"tag-item\">\n <a class=\"tag\" style=\"font-size: 26px\" href=\"/tag/life/\">life</a>\n </span>\n \n <span class=\"tag-item\">\n <a class=\"tag\" style=\"font-size: 24px\" href=\"/tag/humor/\">humor</a>\n </span>\n \n <span class=\"tag-item\">\n <a class=\"tag\" style=\"font-size: 22px\" href=\"/tag/books/\">books</a>\n </span>\n \n <span class=\"tag-item\">\n <a class=\"tag\" style=\"font-size: 14px\" href=\"/tag/reading/\">reading</a>\n </span>\n \n <span class=\"tag-item\">\n <a class=\"tag\" style=\"font-size: 10px\" href=\"/tag/friendship/\">friendship</a>\n </span>\n \n <span class=\"tag-item\">\n <a class=\"tag\" style=\"font-size: 8px\" href=\"/tag/friends/\">friends</a>\n </span>\n \n <span class=\"tag-item\">\n <a class=\"tag\" style=\"font-size: 8px\" href=\"/tag/truth/\">truth</a>\n </span>\n \n <span class=\"tag-item\">\n <a class=\"tag\" style=\"font-size: 6px\" href=\"/tag/simile/\">simile</a>\n </span>\n \n \n </div>\n</div>\n\n </div>\n <footer class=\"footer\">\n <div class=\"container\">\n <p class=\"text-muted\">\n Quotes by: <a href=\"https://www.goodreads.com/quotes\">GoodReads.com</a>\n </p>\n <p class=\"copyright\">\n Made with <span class=\"zyte\">❤</span> by <a class=\"zyte\" href=\"https://www.zyte.com\">Zyte</a>\n </p>\n </div>\n </footer>\n\n</body></html>", "echoData": 3}
{"url": "https://books.toscrape.com/", "statusCode": 200, "browserHtml": "<!DOCTYPE html><!--[if lt IE 7]> <html lang=\"en-us\" class=\"no-js lt-ie9 lt-ie8 lt-ie7\"> <![endif]--><!--[if IE 7]> <html lang=\"en-us\" class=\"no-js lt-ie9 lt-ie8\"> <![endif]--><!--[if IE 8]> <html lang=\"en-us\" class=\"no-js lt-ie9\"> <![endif]--><!--[if gt IE 8]><!--><html lang=\"en-us\" class=\"no-js\"><!--<![endif]--><head>\n <title>\n All products | Books to Scrape - Sandbox\n</title>\n\n <meta http-equiv=\"content-type\" content=\"text/html; charset=UTF-8\">\n <meta name=\"created\" content=\"24th Jun 2016 09:29\">\n <meta name=\"description\" content=\"\">\n <meta name=\"viewport\" content=\"width=device-width\">\n <meta name=\"robots\" content=\"NOARCHIVE,NOCACHE\">\n\n <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->\n <!--[if lt IE 9]>\n <script src=\"//html5shim.googlecode.com/svn/trunk/html5.js\"></script>\n <![endif]-->\n\n \n <link rel=\"shortcut icon\" href=\"static/oscar/favicon.ico\">\n \n\n \n \n \n \n <link rel=\"stylesheet\" type=\"text/css\" href=\"static/oscar/css/styles.css\">\n \n <link rel=\"stylesheet\" href=\"static/oscar/js/bootstrap-datetimepicker/bootstrap-datetimepicker.css\">\n <link rel=\"stylesheet\" type=\"text/css\" href=\"static/oscar/css/datetimepicker.css\">\n\n\n \n \n\n \n\n \n \n \n\n \n </head>\n\n <body id=\"default\" class=\"default\">\n \n \n \n \n <header class=\"header container-fluid\">\n <div class=\"page_inner\">\n <div class=\"row\">\n <div class=\"col-sm-8 h1\"><a href=\"index.html\">Books to Scrape</a><small> We love being scraped!</small>\n</div>\n\n \n </div>\n </div>\n </header>\n\n \n \n<div class=\"container-fluid page\">\n <div class=\"page_inner\">\n \n <ul class=\"breadcrumb\">\n <li>\n <a href=\"index.html\">Home</a>\n </li>\n <li class=\"active\">All products</li>\n </ul>\n\n <div class=\"row\">\n\n <aside class=\"sidebar col-sm-4 col-md-3\">\n \n <div id=\"promotions_left\">\n \n </div>\n \n \n \n \n <div class=\"side_categories\">\n <ul class=\"nav nav-list\">\n \n <li>\n <a href=\"catalogue/category/books_1/index.html\">\n \n Books\n \n </a>\n\n <ul>\n \n \n <li>\n <a href=\"catalogue/category/books/travel_2/index.html\">\n \n Travel\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/mystery_3/index.html\">\n \n Mystery\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/historical-fiction_4/index.html\">\n \n Historical Fiction\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/sequential-art_5/index.html\">\n \n Sequential Art\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/classics_6/index.html\">\n \n Classics\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/philosophy_7/index.html\">\n \n Philosophy\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/romance_8/index.html\">\n \n Romance\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/womens-fiction_9/index.html\">\n \n Womens Fiction\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/fiction_10/index.html\">\n \n Fiction\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/childrens_11/index.html\">\n \n Childrens\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/religion_12/index.html\">\n \n Religion\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/nonfiction_13/index.html\">\n \n Nonfiction\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/music_14/index.html\">\n \n Music\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/default_15/index.html\">\n \n Default\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/science-fiction_16/index.html\">\n \n Science Fiction\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/sports-and-games_17/index.html\">\n \n Sports and Games\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/add-a-comment_18/index.html\">\n \n Add a comment\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/fantasy_19/index.html\">\n \n Fantasy\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/new-adult_20/index.html\">\n \n New Adult\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/young-adult_21/index.html\">\n \n Young Adult\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/science_22/index.html\">\n \n Science\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/poetry_23/index.html\">\n \n Poetry\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/paranormal_24/index.html\">\n \n Paranormal\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/art_25/index.html\">\n \n Art\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/psychology_26/index.html\">\n \n Psychology\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/autobiography_27/index.html\">\n \n Autobiography\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/parenting_28/index.html\">\n \n Parenting\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/adult-fiction_29/index.html\">\n \n Adult Fiction\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/humor_30/index.html\">\n \n Humor\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/horror_31/index.html\">\n \n Horror\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/history_32/index.html\">\n \n History\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/food-and-drink_33/index.html\">\n \n Food and Drink\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/christian-fiction_34/index.html\">\n \n Christian Fiction\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/business_35/index.html\">\n \n Business\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/biography_36/index.html\">\n \n Biography\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/thriller_37/index.html\">\n \n Thriller\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/contemporary_38/index.html\">\n \n Contemporary\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/spirituality_39/index.html\">\n \n Spirituality\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/academic_40/index.html\">\n \n Academic\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/self-help_41/index.html\">\n \n Self Help\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/historical_42/index.html\">\n \n Historical\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/christian_43/index.html\">\n \n Christian\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/suspense_44/index.html\">\n \n Suspense\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/short-stories_45/index.html\">\n \n Short Stories\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/novels_46/index.html\">\n \n Novels\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/health_47/index.html\">\n \n Health\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/politics_48/index.html\">\n \n Politics\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/cultural_49/index.html\">\n \n Cultural\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/erotica_50/index.html\">\n \n Erotica\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"catalogue/category/books/crime_51/index.html\">\n \n Crime\n \n </a>\n\n </li>\n \n </ul></li>\n \n \n </ul>\n </div>\n \n \n\n </aside>\n\n <div class=\"col-sm-8 col-md-9\">\n \n <div class=\"page-header action\">\n <h1>All products</h1>\n </div>\n \n\n \n\n\n\n<div id=\"messages\">\n\n</div>\n\n\n <div id=\"promotions\">\n \n </div>\n\n \n <form method=\"get\" class=\"form-horizontal\">\n \n <div style=\"display:none\">\n \n \n </div>\n\n \n \n \n <strong>1000</strong> results - showing <strong>1</strong> to <strong>20</strong>.\n \n \n \n \n </form>\n \n <section>\n <div class=\"alert alert-warning\" role=\"alert\"><strong>Warning!</strong> This is a demo website for web scraping purposes. Prices and ratings here were randomly assigned and have no real meaning.</div>\n\n <div>\n <ol class=\"row\">\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"catalogue/a-light-in-the-attic_1000/index.html\"><img src=\"media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg\" alt=\"A Light in the Attic\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Three\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"catalogue/a-light-in-the-attic_1000/index.html\" title=\"A Light in the Attic\">A Light in the ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£51.77</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"catalogue/tipping-the-velvet_999/index.html\"><img src=\"media/cache/26/0c/260c6ae16bce31c8f8c95daddd9f4a1c.jpg\" alt=\"Tipping the Velvet\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating One\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"catalogue/tipping-the-velvet_999/index.html\" title=\"Tipping the Velvet\">Tipping the Velvet</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£53.74</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"catalogue/soumission_998/index.html\"><img src=\"media/cache/3e/ef/3eef99c9d9adef34639f510662022830.jpg\" alt=\"Soumission\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating One\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"catalogue/soumission_998/index.html\" title=\"Soumission\">Soumission</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£50.10</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"catalogue/sharp-objects_997/index.html\"><img src=\"media/cache/32/51/3251cf3a3412f53f339e42cac2134093.jpg\" alt=\"Sharp Objects\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Four\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"catalogue/sharp-objects_997/index.html\" title=\"Sharp Objects\">Sharp Objects</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£47.82</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"catalogue/sapiens-a-brief-history-of-humankind_996/index.html\"><img src=\"media/cache/be/a5/bea5697f2534a2f86a3ef27b5a8c12a6.jpg\" alt=\"Sapiens: A Brief History of Humankind\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Five\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"catalogue/sapiens-a-brief-history-of-humankind_996/index.html\" title=\"Sapiens: A Brief History of Humankind\">Sapiens: A Brief History ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£54.23</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"catalogue/the-requiem-red_995/index.html\"><img src=\"media/cache/68/33/68339b4c9bc034267e1da611ab3b34f8.jpg\" alt=\"The Requiem Red\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating One\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"catalogue/the-requiem-red_995/index.html\" title=\"The Requiem Red\">The Requiem Red</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£22.65</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"catalogue/the-dirty-little-secrets-of-getting-your-dream-job_994/index.html\"><img src=\"media/cache/92/27/92274a95b7c251fea59a2b8a78275ab4.jpg\" alt=\"The Dirty Little Secrets of Getting Your Dream Job\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Four\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"catalogue/the-dirty-little-secrets-of-getting-your-dream-job_994/index.html\" title=\"The Dirty Little Secrets of Getting Your Dream Job\">The Dirty Little Secrets ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£33.34</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"catalogue/the-coming-woman-a-novel-based-on-the-life-of-the-infamous-feminist-victoria-woodhull_993/index.html\"><img src=\"media/cache/3d/54/3d54940e57e662c4dd1f3ff00c78cc64.jpg\" alt=\"The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Three\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"catalogue/the-coming-woman-a-novel-based-on-the-life-of-the-infamous-feminist-victoria-woodhull_993/index.html\" title=\"The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull\">The Coming Woman: A ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£17.93</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"catalogue/the-boys-in-the-boat-nine-americans-and-their-epic-quest-for-gold-at-the-1936-berlin-olympics_992/index.html\"><img src=\"media/cache/66/88/66883b91f6804b2323c8369331cb7dd1.jpg\" alt=\"The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Four\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"catalogue/the-boys-in-the-boat-nine-americans-and-their-epic-quest-for-gold-at-the-1936-berlin-olympics_992/index.html\" title=\"The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics\">The Boys in the ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£22.60</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"catalogue/the-black-maria_991/index.html\"><img src=\"media/cache/58/46/5846057e28022268153beff6d352b06c.jpg\" alt=\"The Black Maria\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating One\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"catalogue/the-black-maria_991/index.html\" title=\"The Black Maria\">The Black Maria</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£52.15</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"catalogue/starving-hearts-triangular-trade-trilogy-1_990/index.html\"><img src=\"media/cache/be/f4/bef44da28c98f905a3ebec0b87be8530.jpg\" alt=\"Starving Hearts (Triangular Trade Trilogy, #1)\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Two\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"catalogue/starving-hearts-triangular-trade-trilogy-1_990/index.html\" title=\"Starving Hearts (Triangular Trade Trilogy, #1)\">Starving Hearts (Triangular Trade ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£13.99</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"catalogue/shakespeares-sonnets_989/index.html\"><img src=\"media/cache/10/48/1048f63d3b5061cd2f424d20b3f9b666.jpg\" alt=\"Shakespeare's Sonnets\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Four\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"catalogue/shakespeares-sonnets_989/index.html\" title=\"Shakespeare's Sonnets\">Shakespeare's Sonnets</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£20.66</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"catalogue/set-me-free_988/index.html\"><img src=\"media/cache/5b/88/5b88c52633f53cacf162c15f4f823153.jpg\" alt=\"Set Me Free\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Five\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"catalogue/set-me-free_988/index.html\" title=\"Set Me Free\">Set Me Free</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£17.46</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"catalogue/scott-pilgrims-precious-little-life-scott-pilgrim-1_987/index.html\"><img src=\"media/cache/94/b1/94b1b8b244bce9677c2f29ccc890d4d2.jpg\" alt=\"Scott Pilgrim's Precious Little Life (Scott Pilgrim #1)\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Five\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"catalogue/scott-pilgrims-precious-little-life-scott-pilgrim-1_987/index.html\" title=\"Scott Pilgrim's Precious Little Life (Scott Pilgrim #1)\">Scott Pilgrim's Precious Little ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£52.29</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"catalogue/rip-it-up-and-start-again_986/index.html\"><img src=\"media/cache/81/c4/81c4a973364e17d01f217e1188253d5e.jpg\" alt=\"Rip it Up and Start Again\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Five\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"catalogue/rip-it-up-and-start-again_986/index.html\" title=\"Rip it Up and Start Again\">Rip it Up and ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£35.02</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"catalogue/our-band-could-be-your-life-scenes-from-the-american-indie-underground-1981-1991_985/index.html\"><img src=\"media/cache/54/60/54607fe8945897cdcced0044103b10b6.jpg\" alt=\"Our Band Could Be Your Life: Scenes from the American Indie Underground, 1981-1991\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Three\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"catalogue/our-band-could-be-your-life-scenes-from-the-american-indie-underground-1981-1991_985/index.html\" title=\"Our Band Could Be Your Life: Scenes from the American Indie Underground, 1981-1991\">Our Band Could Be ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£57.25</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"catalogue/olio_984/index.html\"><img src=\"media/cache/55/33/553310a7162dfbc2c6d19a84da0df9e1.jpg\" alt=\"Olio\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating One\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"catalogue/olio_984/index.html\" title=\"Olio\">Olio</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£23.88</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"catalogue/mesaerion-the-best-science-fiction-stories-1800-1849_983/index.html\"><img src=\"media/cache/09/a3/09a3aef48557576e1a85ba7efea8ecb7.jpg\" alt=\"Mesaerion: The Best Science Fiction Stories 1800-1849\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating One\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"catalogue/mesaerion-the-best-science-fiction-stories-1800-1849_983/index.html\" title=\"Mesaerion: The Best Science Fiction Stories 1800-1849\">Mesaerion: The Best Science ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£37.59</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"catalogue/libertarianism-for-beginners_982/index.html\"><img src=\"media/cache/0b/bc/0bbcd0a6f4bcd81ccb1049a52736406e.jpg\" alt=\"Libertarianism for Beginners\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Two\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"catalogue/libertarianism-for-beginners_982/index.html\" title=\"Libertarianism for Beginners\">Libertarianism for Beginners</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£51.33</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"catalogue/its-only-the-himalayas_981/index.html\"><img src=\"media/cache/27/a5/27a53d0bb95bdd88288eaf66c9230d7e.jpg\" alt=\"It's Only the Himalayas\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Two\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"catalogue/its-only-the-himalayas_981/index.html\" title=\"It's Only the Himalayas\">It's Only the Himalayas</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£45.17</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n </ol>\n \n\n\n\n <div>\n <ul class=\"pager\">\n \n <li class=\"current\">\n \n Page 1 of 50\n \n </li>\n \n <li class=\"next\"><a href=\"catalogue/page-2.html\">next</a></li>\n \n </ul>\n </div>\n\n\n </div>\n </section>\n \n\n\n </div>\n\n </div><!-- /row -->\n </div><!-- /page_inner -->\n</div><!-- /container-fluid -->\n\n\n \n<footer class=\"footer container-fluid\">\n \n \n \n</footer>\n\n\n \n \n \n <!-- jQuery -->\n <script src=\"http://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js\"></script>\n <script>window.jQuery || document.write('<script src=\"static/oscar/js/jquery/jquery-1.9.1.min.js\"><\\/script>')</script><script src=\"static/oscar/js/jquery/jquery-1.9.1.min.js\"></script>\n \n \n\n\n \n \n \n \n <script type=\"text/javascript\" src=\"static/oscar/js/bootstrap3/bootstrap.min.js\"></script>\n <!-- Oscar -->\n <script src=\"static/oscar/js/oscar/ui.js\" type=\"text/javascript\" charset=\"utf-8\"></script>\n\n <script src=\"static/oscar/js/bootstrap-datetimepicker/bootstrap-datetimepicker.js\" type=\"text/javascript\" charset=\"utf-8\"></script>\n <script src=\"static/oscar/js/bootstrap-datetimepicker/locales/bootstrap-datetimepicker.all.js\" type=\"text/javascript\" charset=\"utf-8\"></script>\n\n\n \n \n \n\n \n\n\n \n <script type=\"text/javascript\">\n $(function() {\n \n \n \n oscar.init();\n\n oscar.search.init();\n\n });\n </script>\n\n \n <!-- Version: N/A -->\n \n \n\n</body></html>", "echoData": 2}
{"url": "https://toscrape.com/", "statusCode": 200, "browserHtml": "<!DOCTYPE html><html lang=\"en\"><head>\n <meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">\n <title>Scraping Sandbox</title>\n <link href=\"./css/bootstrap.min.css\" rel=\"stylesheet\">\n <link href=\"./css/main.css\" rel=\"stylesheet\">\n </head>\n <body>\n <div class=\"container\">\n <div class=\"row\">\n <div class=\"col-md-1\"></div>\n <div class=\"col-md-10 well\">\n <img class=\"logo\" src=\"img/zyte.png\" width=\"200px\">\n <h1 class=\"text-right\">Web Scraping Sandbox</h1>\n </div>\n </div>\n\n <div class=\"row\">\n <div class=\"col-md-1\"></div>\n <div class=\"col-md-10\">\n <h2>Books</h2>\n <p>A <a href=\"http://books.toscrape.com\">fictional bookstore</a> that desperately wants to be scraped. It's a safe place for beginners learning web scraping and for developers validating their scraping technologies as well. Available at: <a href=\"http://books.toscrape.com\">books.toscrape.com</a></p>\n <div class=\"col-md-6\">\n <a href=\"http://books.toscrape.com\"><img src=\"./img/books.png\" class=\"img-thumbnail\"></a>\n </div>\n <div class=\"col-md-6\">\n <table class=\"table table-hover\">\n <tbody><tr><th colspan=\"2\">Details</th></tr>\n <tr><td>Amount of items </td><td>1000</td></tr>\n <tr><td>Pagination </td><td>✔</td></tr>\n <tr><td>Items per page </td><td>max 20</td></tr>\n <tr><td>Requires JavaScript </td><td>✘</td></tr>\n </tbody></table>\n </div>\n </div>\n </div>\n\n <div class=\"row\">\n <div class=\"col-md-1\"></div>\n <div class=\"col-md-10\">\n <h2>Quotes</h2>\n <p><a href=\"http://quotes.toscrape.com/\">A website</a> that lists quotes from famous people. It has many endpoints showing the quotes in many different ways, each of them including new scraping challenges for you, as described below.</p>\n <div class=\"col-md-6\">\n <a href=\"http://quotes.toscrape.com\"><img src=\"./img/quotes.png\" class=\"img-thumbnail\"></a>\n </div>\n <div class=\"col-md-6\">\n <table class=\"table table-hover\">\n <tbody><tr><th colspan=\"2\">Endpoints</th></tr>\n <tr><td><a href=\"http://quotes.toscrape.com/\">Default</a></td><td>Microdata and pagination</td></tr>\n <tr><td><a href=\"http://quotes.toscrape.com/scroll\">Scroll</a> </td><td>infinite scrolling pagination</td></tr>\n <tr><td><a href=\"http://quotes.toscrape.com/js\">JavaScript</a> </td><td>JavaScript generated content</td></tr>\n <tr><td><a href=\"http://quotes.toscrape.com/js-delayed\">Delayed</a> </td><td>Same as JavaScript but with a delay (?delay=10000)</td></tr>\n <tr><td><a href=\"http://quotes.toscrape.com/tableful\">Tableful</a> </td><td>a table based messed-up layout</td></tr>\n <tr><td><a href=\"http://quotes.toscrape.com/login\">Login</a> </td><td>login with CSRF token (any user/passwd works)</td></tr>\n <tr><td><a href=\"http://quotes.toscrape.com/search.aspx\">ViewState</a> </td><td>an AJAX based filter form with ViewStates</td></tr>\n <tr><td><a href=\"http://quotes.toscrape.com/random\">Random</a> </td><td>a single random quote</td></tr>\n </tbody></table>\n </div>\n </div>\n </div>\n </div>\n \n\n</body></html>", "echoData": 1}
Sending a POST request
Tip
For a more complete example featuring a request body and headers, see the HTML form example.
Note
Install and configure code example requirements and the Zyte CA certificate to run the example below.
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.All
};
HttpClient client = new HttpClient(handler);
var apiKey = "YOUR_API_KEY";
var bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(apiKey + ":");
var auth = System.Convert.ToBase64String(bytes);
client.DefaultRequestHeaders.Add("Authorization", "Basic " + auth);
client.DefaultRequestHeaders.Add("Accept-Encoding", "br, gzip, deflate");
var input = new Dictionary<string, object>(){
{"url", "https://httpbin.org/anything"},
{"httpResponseBody", true},
{"httpRequestMethod", "POST"}
};
var inputJson = JsonSerializer.Serialize(input);
var content = new StringContent(inputJson, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("https://api.zyte.com/v1/extract", content);
var body = await response.Content.ReadAsByteArrayAsync();
var data = JsonDocument.Parse(body);
var base64HttpResponseBody = data.RootElement.GetProperty("httpResponseBody").ToString();
var httpResponseBody = System.Convert.FromBase64String(base64HttpResponseBody);
var responseData = JsonDocument.Parse(httpResponseBody);
var method = responseData.RootElement.GetProperty("method").ToString();
{"url": "https://httpbin.org/anything", "httpResponseBody": true, "httpRequestMethod": "POST"}
zyte-api input.jsonl \
| jq --raw-output .httpResponseBody \
| base64 --decode \
| jq .method
{
"url": "https://httpbin.org/anything",
"httpResponseBody": true,
"httpRequestMethod": "POST"
}
curl \
--user YOUR_API_KEY: \
--header 'Content-Type: application/json' \
--data @input.json \
--compressed \
https://api.zyte.com/v1/extract \
| jq --raw-output .httpResponseBody \
| base64 --decode \
| jq .method
import com.google.common.collect.ImmutableMap;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Map;
import org.apache.hc.client5.http.classic.methods.HttpPost;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.ContentType;
import org.apache.hc.core5.http.HttpEntity;
import org.apache.hc.core5.http.HttpHeaders;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.io.entity.EntityUtils;
import org.apache.hc.core5.http.io.entity.StringEntity;
class Example {
private static final String API_KEY = "YOUR_API_KEY";
public static void main(final String[] args)
throws InterruptedException, IOException, ParseException {
Map<String, Object> parameters =
ImmutableMap.of(
"url",
"https://httpbin.org/anything",
"httpResponseBody",
true,
"httpRequestMethod",
"POST");
String requestBody = new Gson().toJson(parameters);
HttpPost request = new HttpPost("https://api.zyte.com/v1/extract");
request.setHeader(HttpHeaders.CONTENT_TYPE, ContentType.APPLICATION_JSON);
request.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
request.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
request.setEntity(new StringEntity(requestBody));
CloseableHttpClient client = HttpClients.createDefault();
client.execute(
request,
response -> {
HttpEntity entity = response.getEntity();
String apiResponse = EntityUtils.toString(entity, StandardCharsets.UTF_8);
JsonObject jsonObject = JsonParser.parseString(apiResponse).getAsJsonObject();
String base64HttpResponseBody = jsonObject.get("httpResponseBody").getAsString();
byte[] httpResponseBodyBytes = Base64.getDecoder().decode(base64HttpResponseBody);
String httpResponseBody = new String(httpResponseBodyBytes, StandardCharsets.UTF_8);
JsonObject data = JsonParser.parseString(httpResponseBody).getAsJsonObject();
String method = data.get("method").getAsString();
System.out.println(method);
return null;
});
}
private static String buildAuthHeader() {
String auth = API_KEY + ":";
String encodedAuth = Base64.getEncoder().encodeToString(auth.getBytes());
return "Basic " + encodedAuth;
}
}
const axios = require('axios')
axios.post(
'https://api.zyte.com/v1/extract',
{
url: 'https://httpbin.org/anything',
httpResponseBody: true,
httpRequestMethod: 'POST'
},
{
auth: { username: 'YOUR_API_KEY' }
}
).then((response) => {
const httpResponseBody = Buffer.from(
response.data.httpResponseBody,
'base64'
)
const method = JSON.parse(httpResponseBody).method
})
<?php
$client = new GuzzleHttp\Client();
$response = $client->request('POST', 'https://api.zyte.com/v1/extract', [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => 'https://httpbin.org/anything',
'httpResponseBody' => true,
'httpRequestMethod' => 'POST',
],
]);
$data = json_decode($response->getBody());
$http_response_body = base64_decode($data->httpResponseBody);
$method = json_decode($http_response_body)->method;
With the proxy mode, the request method from your requests is used automatically.
curl \
--proxy api.zyte.com:8011 \
--proxy-user YOUR_API_KEY: \
--compressed \
-X POST \
https://httpbin.org/anything \
| jq .method
import json
from base64 import b64decode
import requests
api_response = requests.post(
"https://api.zyte.com/v1/extract",
auth=("YOUR_API_KEY", ""),
json={
"url": "https://httpbin.org/anything",
"httpResponseBody": True,
"httpRequestMethod": "POST",
},
)
http_response_body = b64decode(api_response.json()["httpResponseBody"])
method = json.loads(http_response_body)["method"]
import asyncio
import json
from base64 import b64decode
from zyte_api import AsyncZyteAPI
async def main():
client = AsyncZyteAPI()
api_response = await client.get(
{
"url": "https://httpbin.org/anything",
"httpResponseBody": True,
"httpRequestMethod": "POST",
}
)
http_response_body: bytes = b64decode(api_response["httpResponseBody"])
method = json.loads(http_response_body)["method"]
print(method)
asyncio.run(main())
import json
from scrapy import Request, Spider
class HTTPBinOrgSpider(Spider):
name = "httpbin_org"
def start_requests(self):
yield Request(
"https://httpbin.org/anything",
method="POST",
)
def parse(self, response):
method = json.loads(response.text)["method"]
Output:
"POST"
Using network capture to intercept
background requests sent during browser rendering
Note
Install and configure code example requirements and the Zyte CA certificate to run the example below.
using System;
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.All
};
HttpClient client = new HttpClient(handler);
var apiKey = "YOUR_API_KEY";
var bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(apiKey + ":");
var auth = System.Convert.ToBase64String(bytes);
client.DefaultRequestHeaders.Add("Authorization", "Basic " + auth);
client.DefaultRequestHeaders.Add("Accept-Encoding", "br, gzip, deflate");
var input = new Dictionary<string, object>(){
{"url", "https://quotes.toscrape.com/scroll"},
{"browserHtml", true},
{
"networkCapture",
new List<Dictionary<string, object>>()
{
new Dictionary<string, object>()
{
{"filterType", "url"},
{"httpResponseBody", true},
{"value", "/api/"},
{"matchType", "contains"}
}
}
}
};
var inputJson = JsonSerializer.Serialize(input);
var content = new StringContent(inputJson, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("https://api.zyte.com/v1/extract", content);
var apiBody = await response.Content.ReadAsByteArrayAsync();
var data = JsonDocument.Parse(apiBody);
var captureEnumerator = data.RootElement.GetProperty("networkCapture").EnumerateArray();
captureEnumerator.MoveNext();
var capture = captureEnumerator.Current;
var base64Body = capture.GetProperty("httpResponseBody").ToString();
var body = System.Convert.FromBase64String(base64Body);
var captureData = JsonDocument.Parse(body);
var quoteEnumerator = captureData.RootElement.GetProperty("quotes").EnumerateArray();
quoteEnumerator.MoveNext();
var quote = quoteEnumerator.Current;
var authorEnumerator = quote.GetProperty("author").EnumerateObject();
while (authorEnumerator.MoveNext())
{
if (authorEnumerator.Current.Name.ToString() == "name")
{
Console.WriteLine(authorEnumerator.Current.Value.ToString());
break;
}
}
{"url": "https://quotes.toscrape.com/scroll", "browserHtml": true, "networkCapture": [{"filterType": "url", "httpResponseBody": true, "value": "/api/", "matchType": "contains"}]}
zyte-api input.jsonl \
| jq --raw-output ".networkCapture[0].httpResponseBody" \
| base64 --decode \
| jq --raw-output ".quotes[0].author.name"
{
"url": "https://quotes.toscrape.com/scroll",
"browserHtml": true,
"networkCapture": [
{
"filterType": "url",
"httpResponseBody": true,
"value": "/api/",
"matchType": "contains"
}
]
}
curl \
--user YOUR_API_KEY: \
--header 'Content-Type: application/json' \
--data @input.json \
--compressed \
https://api.zyte.com/v1/extract \
| jq --raw-output ".networkCapture[0].httpResponseBody" \
| base64 --decode \
| jq --raw-output ".quotes[0].author.name"
import com.google.common.collect.ImmutableMap;
import com.google.gson.Gson;
import com.google.gson.JsonArray;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Collections;
import java.util.Map;
import org.apache.hc.client5.http.classic.methods.HttpPost;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.ContentType;
import org.apache.hc.core5.http.HttpEntity;
import org.apache.hc.core5.http.HttpHeaders;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.io.entity.EntityUtils;
import org.apache.hc.core5.http.io.entity.StringEntity;
class Example {
private static final String API_KEY = "YOUR_API_KEY";
public static void main(final String[] args)
throws InterruptedException, IOException, ParseException {
Map<String, Object> filter =
ImmutableMap.of(
"filterType",
"url",
"httpResponseBody",
true,
"value",
"/api/",
"matchType",
"contains");
Map<String, Object> parameters =
ImmutableMap.of(
"url",
"https://quotes.toscrape.com/scroll",
"browserHtml",
true,
"networkCapture",
Collections.singletonList(filter));
String requestBody = new Gson().toJson(parameters);
HttpPost request = new HttpPost("https://api.zyte.com/v1/extract");
request.setHeader(HttpHeaders.CONTENT_TYPE, ContentType.APPLICATION_JSON);
request.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
request.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
request.setEntity(new StringEntity(requestBody));
CloseableHttpClient client = HttpClients.createDefault();
client.execute(
request,
response -> {
HttpEntity entity = response.getEntity();
String apiResponse = EntityUtils.toString(entity, StandardCharsets.UTF_8);
JsonObject jsonObject = JsonParser.parseString(apiResponse).getAsJsonObject();
JsonArray captures = jsonObject.get("networkCapture").getAsJsonArray();
JsonObject capture = captures.get(0).getAsJsonObject();
byte[] bodyBytes =
Base64.getDecoder().decode(capture.get("httpResponseBody").getAsString());
String body = new String(bodyBytes, StandardCharsets.UTF_8);
JsonObject data = JsonParser.parseString(body).getAsJsonObject();
JsonObject quote = data.get("quotes").getAsJsonArray().get(0).getAsJsonObject();
String authorName = quote.get("author").getAsJsonObject().get("name").getAsString();
System.out.println(authorName);
return null;
});
}
private static String buildAuthHeader() {
String auth = API_KEY + ":";
String encodedAuth = Base64.getEncoder().encodeToString(auth.getBytes());
return "Basic " + encodedAuth;
}
}
const axios = require('axios')
axios.post(
'https://api.zyte.com/v1/extract',
{
url: 'https://quotes.toscrape.com/scroll',
browserHtml: true,
networkCapture: [
{
filterType: 'url',
httpResponseBody: true,
value: '/api/',
matchType: 'contains'
}
]
},
{
auth: { username: 'YOUR_API_KEY' }
}
).then((response) => {
const capture = response.data.networkCapture[0]
const data = JSON.parse(Buffer.from(capture.httpResponseBody, 'base64'))
console.log(data.quotes[0].author.name)
})
<?php
$client = new GuzzleHttp\Client();
$response = $client->request('POST', 'https://api.zyte.com/v1/extract', [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => 'https://quotes.toscrape.com/scroll',
'browserHtml' => true,
'networkCapture' => [
[
'filterType' => 'url',
'httpResponseBody' => true,
'value' => '/api/',
'matchType' => 'contains',
],
],
],
]);
$api_response = json_decode($response->getBody());
$capture = $api_response->networkCapture[0];
$data = json_decode(base64_decode($capture->httpResponseBody));
echo $data->quotes[0]->author->name.PHP_EOL;
import json
from base64 import b64decode
import requests
api_response = requests.post(
"https://api.zyte.com/v1/extract",
auth=("YOUR_API_KEY", ""),
json={
"url": "https://quotes.toscrape.com/scroll",
"browserHtml": True,
"networkCapture": [
{
"filterType": "url",
"httpResponseBody": True,
"value": "/api/",
"matchType": "contains",
},
],
},
)
capture = api_response.json()["networkCapture"][0]
data = json.loads(b64decode(capture["httpResponseBody"]).decode())
print(data["quotes"][0]["author"]["name"])
import asyncio
import json
from base64 import b64decode
from zyte_api import AsyncZyteAPI
async def main():
client = AsyncZyteAPI()
api_response = await client.get(
{
"url": "https://quotes.toscrape.com/scroll",
"browserHtml": True,
"networkCapture": [
{
"filterType": "url",
"httpResponseBody": True,
"value": "/api/",
"matchType": "contains",
},
],
},
)
capture = api_response["networkCapture"][0]
data = json.loads(b64decode(capture["httpResponseBody"]).decode())
print(data["quotes"][0]["author"]["name"])
asyncio.run(main())
import json
from base64 import b64decode
from scrapy import Request, Spider
class QuotesToScrapeComSpider(Spider):
name = "quotes_toscrape_com"
def start_requests(self):
yield Request(
"https://quotes.toscrape.com/scroll",
meta={
"zyte_api_automap": {
"browserHtml": True,
"networkCapture": [
{
"filterType": "url",
"httpResponseBody": True,
"value": "/api/",
"matchType": "contains",
},
],
},
},
)
def parse(self, response):
capture = response.raw_api_response["networkCapture"][0]
data = json.loads(b64decode(capture["httpResponseBody"]).decode())
print(data["quotes"][0]["author"]["name"])
Output:
Albert Einstein
Sending multiple requests in parallel
Note
Install and configure code example requirements and the Zyte CA certificate to run the example below.
using System.Collections.Generic;
using System.Linq;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
var urls = new string[2];
urls[0] = "https://books.toscrape.com/catalogue/page-1.html";
urls[1] = "https://books.toscrape.com/catalogue/page-2.html";
var output = new List<HttpResponseMessage>();
var handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.All,
MaxConnectionsPerServer = 15
};
var client = new HttpClient(handler);
var apiKey = "YOUR_API_KEY";
var bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(apiKey + ":");
var auth = System.Convert.ToBase64String(bytes);
client.DefaultRequestHeaders.Add("Authorization", "Basic " + auth);
client.DefaultRequestHeaders.Add("Accept-Encoding", "br, gzip, deflate");
var responseTasks = new List<Task<HttpResponseMessage>>();
foreach (var url in urls)
{
var input = new Dictionary<string, object>(){
{"url", url},
{"browserHtml", true}
};
var inputJson = JsonSerializer.Serialize(input);
var content = new StringContent(inputJson, Encoding.UTF8, "application/json");
var responseTask = client.PostAsync("https://api.zyte.com/v1/extract", content);
responseTasks.Add(responseTask);
}
while (responseTasks.Any())
{
var responseTask = await Task.WhenAny(responseTasks);
responseTasks.Remove(responseTask);
var response = await responseTask;
output.Add(response);
}
{"url": "https://books.toscrape.com/catalogue/page-1.html", "browserHtml": true}
{"url": "https://books.toscrape.com/catalogue/page-2.html", "browserHtml": true}
zyte-api --n-conn 15 input.jsonl -o output.jsonl
{"url": "https://books.toscrape.com/catalogue/page-1.html", "browserHtml": true}
{"url": "https://books.toscrape.com/catalogue/page-2.html", "browserHtml": true}
cat input.jsonl \
| xargs -P 15 -d\\n -n 1 \
bash -c "
curl \
--user YOUR_API_KEY: \
--header 'Content-Type: application/json' \
--data \"\$0\" \
--compressed \
https://api.zyte.com/v1/extract \
| awk '{print \$1}' \
>> output.jsonl
"
import com.google.common.collect.ImmutableMap;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Base64;
import java.util.List;
import java.util.Map;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.Future;
import org.apache.hc.client5.http.async.methods.SimpleHttpRequest;
import org.apache.hc.client5.http.async.methods.SimpleHttpResponse;
import org.apache.hc.client5.http.impl.async.CloseableHttpAsyncClient;
import org.apache.hc.client5.http.impl.async.HttpAsyncClients;
import org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManager;
import org.apache.hc.client5.http.impl.nio.PoolingAsyncClientConnectionManagerBuilder;
import org.apache.hc.client5.http.ssl.ClientTlsStrategyBuilder;
import org.apache.hc.core5.concurrent.FutureCallback;
import org.apache.hc.core5.http.ContentType;
import org.apache.hc.core5.http.HttpHeaders;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.nio.ssl.TlsStrategy;
class Example {
private static final String API_KEY = "YOUR_API_KEY";
public static void main(final String[] args)
throws ExecutionException, InterruptedException, IOException, ParseException {
String[] urls = {
"https://books.toscrape.com/catalogue/page-1.html",
"https://books.toscrape.com/catalogue/page-2.html"
};
List<Future> futures = new ArrayList<Future>();
List<String> output = new ArrayList<String>();
int concurrency = 15;
// https://issues.apache.org/jira/browse/HTTPCLIENT-2219
final TlsStrategy tlsStrategy = ClientTlsStrategyBuilder.create().useSystemProperties().build();
PoolingAsyncClientConnectionManager connectionManager =
PoolingAsyncClientConnectionManagerBuilder.create().setTlsStrategy(tlsStrategy).build();
connectionManager.setMaxTotal(concurrency);
connectionManager.setDefaultMaxPerRoute(concurrency);
CloseableHttpAsyncClient client =
HttpAsyncClients.custom().setConnectionManager(connectionManager).build();
try {
client.start();
for (int i = 0; i < urls.length; i++) {
Map<String, Object> parameters = ImmutableMap.of("url", urls[i], "browserHtml", true);
String requestBody = new Gson().toJson(parameters);
SimpleHttpRequest request =
new SimpleHttpRequest("POST", "https://api.zyte.com/v1/extract");
request.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
request.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
request.setBody(requestBody, ContentType.APPLICATION_JSON);
final Future<SimpleHttpResponse> future =
client.execute(
request,
new FutureCallback<SimpleHttpResponse>() {
public void completed(final SimpleHttpResponse response) {
String apiResponse = response.getBodyText();
JsonObject jsonObject = JsonParser.parseString(apiResponse).getAsJsonObject();
String browserHtml = jsonObject.get("browserHtml").getAsString();
output.add(browserHtml);
}
public void failed(final Exception ex) {}
public void cancelled() {}
});
futures.add(future);
}
for (int i = 0; i < futures.size(); i++) {
futures.get(i).get();
}
} finally {
client.close();
}
}
private static String buildAuthHeader() {
String auth = API_KEY + ":";
String encodedAuth = Base64.getEncoder().encodeToString(auth.getBytes());
return "Basic " + encodedAuth;
}
}
const { ConcurrencyManager } = require('axios-concurrency')
const axios = require('axios')
const urls = [
'https://books.toscrape.com/catalogue/page-1.html',
'https://books.toscrape.com/catalogue/page-2.html'
]
const output = []
const client = axios.create()
ConcurrencyManager(client, 15)
Promise.all(
urls.map((url) =>
client.post(
'https://api.zyte.com/v1/extract',
{ url, browserHtml: true },
{
auth: { username: 'YOUR_API_KEY' }
}
).then((response) => output.push(response.data))
)
)
<?php
$urls = [
'https://books.toscrape.com/catalogue/page-1.html',
'https://books.toscrape.com/catalogue/page-2.html',
];
$output = [];
$promises = [];
$client = new GuzzleHttp\Client();
foreach ($urls as $url) {
$options = [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => $url,
'browserHtml' => true,
],
];
$request = new \GuzzleHttp\Psr7\Request('POST', 'https://api.zyte.com/v1/extract');
global $promises;
$promises[] = $client->sendAsync($request, $options)->then(function ($response) {
global $output;
$output[] = json_decode($response->getBody());
});
}
foreach ($promises as $promise) {
$promise->wait();
}
import asyncio
import aiohttp
urls = [
"https://books.toscrape.com/catalogue/page-1.html",
"https://books.toscrape.com/catalogue/page-2.html",
]
output = []
async def extract(client, url):
response = await client.post(
"https://api.zyte.com/v1/extract",
json={"url": url, "browserHtml": True},
auth=aiohttp.BasicAuth("YOUR_API_KEY"),
)
output.append(await response.json())
async def main():
connector = aiohttp.TCPConnector(limit_per_host=15)
async with aiohttp.ClientSession(connector=connector) as client:
await asyncio.gather(*[extract(client, url) for url in urls])
asyncio.run(main())
import asyncio
from zyte_api import AsyncZyteAPI
urls = [
"https://books.toscrape.com/catalogue/page-1.html",
"https://books.toscrape.com/catalogue/page-2.html",
]
async def main():
client = AsyncZyteAPI(n_conn=15)
queries = [{"url": url, "browserHtml": True} for url in urls]
async with client.session() as session:
for future in session.iter(queries):
response = await future
print(response)
asyncio.run(main())
from scrapy import Request, Spider
urls = [
"https://books.toscrape.com/catalogue/page-1.html",
"https://books.toscrape.com/catalogue/page-2.html",
]
class ToScrapeSpider(Spider):
name = "toscrape_com"
custom_settings = {
"CONCURRENT_REQUESTS": 15,
"CONCURRENT_REQUESTS_PER_DOMAIN": 15,
}
def start_requests(self):
for url in urls:
yield Request(
url,
meta={
"zyte_api_automap": {
"browserHtml": True,
},
},
)
def parse(self, response):
yield {
"url": response.url,
"browserHtml": response.text,
}
Output:
{"url": "https://books.toscrape.com/catalogue/page-1.html", "statusCode": 200, "browserHtml": "<!DOCTYPE html><!--[if lt IE 7]> <html lang=\"en-us\" class=\"no-js lt-ie9 lt-ie8 lt-ie7\"> <![endif]--><!--[if IE 7]> <html lang=\"en-us\" class=\"no-js lt-ie9 lt-ie8\"> <![endif]--><!--[if IE 8]> <html lang=\"en-us\" class=\"no-js lt-ie9\"> <![endif]--><!--[if gt IE 8]><!--><html lang=\"en-us\" class=\"no-js\"><!--<![endif]--><head>\n <title>\n All products | Books to Scrape - Sandbox\n</title>\n\n <meta http-equiv=\"content-type\" content=\"text/html; charset=UTF-8\">\n <meta name=\"created\" content=\"24th Jun 2016 09:30\">\n <meta name=\"description\" content=\"\">\n <meta name=\"viewport\" content=\"width=device-width\">\n <meta name=\"robots\" content=\"NOARCHIVE,NOCACHE\">\n\n <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->\n <!--[if lt IE 9]>\n <script src=\"//html5shim.googlecode.com/svn/trunk/html5.js\"></script>\n <![endif]-->\n\n \n <link rel=\"shortcut icon\" href=\"../static/oscar/favicon.ico\">\n \n\n \n \n \n \n <link rel=\"stylesheet\" type=\"text/css\" href=\"../static/oscar/css/styles.css\">\n \n <link rel=\"stylesheet\" href=\"../static/oscar/js/bootstrap-datetimepicker/bootstrap-datetimepicker.css\">\n <link rel=\"stylesheet\" type=\"text/css\" href=\"../static/oscar/css/datetimepicker.css\">\n\n\n \n \n\n \n\n \n \n \n\n \n </head>\n\n <body id=\"default\" class=\"default\">\n \n \n \n \n <header class=\"header container-fluid\">\n <div class=\"page_inner\">\n <div class=\"row\">\n <div class=\"col-sm-8 h1\"><a href=\"../index.html\">Books to Scrape</a><small> We love being scraped!</small>\n</div>\n\n \n </div>\n </div>\n </header>\n\n \n \n<div class=\"container-fluid page\">\n <div class=\"page_inner\">\n \n <ul class=\"breadcrumb\">\n <li>\n <a href=\"../index.html\">Home</a>\n </li>\n <li class=\"active\">All products</li>\n </ul>\n\n <div class=\"row\">\n\n <aside class=\"sidebar col-sm-4 col-md-3\">\n \n <div id=\"promotions_left\">\n \n </div>\n \n \n \n \n <div class=\"side_categories\">\n <ul class=\"nav nav-list\">\n \n <li>\n <a href=\"category/books_1/index.html\">\n \n Books\n \n </a>\n\n <ul>\n \n \n <li>\n <a href=\"category/books/travel_2/index.html\">\n \n Travel\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/mystery_3/index.html\">\n \n Mystery\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/historical-fiction_4/index.html\">\n \n Historical Fiction\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/sequential-art_5/index.html\">\n \n Sequential Art\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/classics_6/index.html\">\n \n Classics\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/philosophy_7/index.html\">\n \n Philosophy\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/romance_8/index.html\">\n \n Romance\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/womens-fiction_9/index.html\">\n \n Womens Fiction\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/fiction_10/index.html\">\n \n Fiction\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/childrens_11/index.html\">\n \n Childrens\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/religion_12/index.html\">\n \n Religion\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/nonfiction_13/index.html\">\n \n Nonfiction\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/music_14/index.html\">\n \n Music\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/default_15/index.html\">\n \n Default\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/science-fiction_16/index.html\">\n \n Science Fiction\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/sports-and-games_17/index.html\">\n \n Sports and Games\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/add-a-comment_18/index.html\">\n \n Add a comment\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/fantasy_19/index.html\">\n \n Fantasy\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/new-adult_20/index.html\">\n \n New Adult\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/young-adult_21/index.html\">\n \n Young Adult\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/science_22/index.html\">\n \n Science\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/poetry_23/index.html\">\n \n Poetry\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/paranormal_24/index.html\">\n \n Paranormal\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/art_25/index.html\">\n \n Art\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/psychology_26/index.html\">\n \n Psychology\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/autobiography_27/index.html\">\n \n Autobiography\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/parenting_28/index.html\">\n \n Parenting\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/adult-fiction_29/index.html\">\n \n Adult Fiction\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/humor_30/index.html\">\n \n Humor\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/horror_31/index.html\">\n \n Horror\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/history_32/index.html\">\n \n History\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/food-and-drink_33/index.html\">\n \n Food and Drink\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/christian-fiction_34/index.html\">\n \n Christian Fiction\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/business_35/index.html\">\n \n Business\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/biography_36/index.html\">\n \n Biography\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/thriller_37/index.html\">\n \n Thriller\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/contemporary_38/index.html\">\n \n Contemporary\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/spirituality_39/index.html\">\n \n Spirituality\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/academic_40/index.html\">\n \n Academic\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/self-help_41/index.html\">\n \n Self Help\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/historical_42/index.html\">\n \n Historical\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/christian_43/index.html\">\n \n Christian\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/suspense_44/index.html\">\n \n Suspense\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/short-stories_45/index.html\">\n \n Short Stories\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/novels_46/index.html\">\n \n Novels\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/health_47/index.html\">\n \n Health\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/politics_48/index.html\">\n \n Politics\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/cultural_49/index.html\">\n \n Cultural\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/erotica_50/index.html\">\n \n Erotica\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/crime_51/index.html\">\n \n Crime\n \n </a>\n\n </li>\n \n </ul></li>\n \n \n </ul>\n </div>\n \n \n\n </aside>\n\n <div class=\"col-sm-8 col-md-9\">\n \n <div class=\"page-header action\">\n <h1>All products</h1>\n </div>\n \n\n \n\n\n\n<div id=\"messages\">\n\n</div>\n\n\n <div id=\"promotions\">\n \n </div>\n\n \n <form method=\"get\" class=\"form-horizontal\">\n \n <div style=\"display:none\">\n \n \n </div>\n\n \n \n \n <strong>1000</strong> results - showing <strong>1</strong> to <strong>20</strong>.\n \n \n \n \n </form>\n \n <section>\n <div class=\"alert alert-warning\" role=\"alert\"><strong>Warning!</strong> This is a demo website for web scraping purposes. Prices and ratings here were randomly assigned and have no real meaning.</div>\n\n <div>\n <ol class=\"row\">\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"a-light-in-the-attic_1000/index.html\"><img src=\"../media/cache/2c/da/2cdad67c44b002e7ead0cc35693c0e8b.jpg\" alt=\"A Light in the Attic\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Three\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"a-light-in-the-attic_1000/index.html\" title=\"A Light in the Attic\">A Light in the ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£51.77</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"tipping-the-velvet_999/index.html\"><img src=\"../media/cache/26/0c/260c6ae16bce31c8f8c95daddd9f4a1c.jpg\" alt=\"Tipping the Velvet\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating One\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"tipping-the-velvet_999/index.html\" title=\"Tipping the Velvet\">Tipping the Velvet</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£53.74</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"soumission_998/index.html\"><img src=\"../media/cache/3e/ef/3eef99c9d9adef34639f510662022830.jpg\" alt=\"Soumission\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating One\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"soumission_998/index.html\" title=\"Soumission\">Soumission</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£50.10</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"sharp-objects_997/index.html\"><img src=\"../media/cache/32/51/3251cf3a3412f53f339e42cac2134093.jpg\" alt=\"Sharp Objects\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Four\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"sharp-objects_997/index.html\" title=\"Sharp Objects\">Sharp Objects</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£47.82</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"sapiens-a-brief-history-of-humankind_996/index.html\"><img src=\"../media/cache/be/a5/bea5697f2534a2f86a3ef27b5a8c12a6.jpg\" alt=\"Sapiens: A Brief History of Humankind\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Five\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"sapiens-a-brief-history-of-humankind_996/index.html\" title=\"Sapiens: A Brief History of Humankind\">Sapiens: A Brief History ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£54.23</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"the-requiem-red_995/index.html\"><img src=\"../media/cache/68/33/68339b4c9bc034267e1da611ab3b34f8.jpg\" alt=\"The Requiem Red\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating One\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"the-requiem-red_995/index.html\" title=\"The Requiem Red\">The Requiem Red</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£22.65</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"the-dirty-little-secrets-of-getting-your-dream-job_994/index.html\"><img src=\"../media/cache/92/27/92274a95b7c251fea59a2b8a78275ab4.jpg\" alt=\"The Dirty Little Secrets of Getting Your Dream Job\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Four\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"the-dirty-little-secrets-of-getting-your-dream-job_994/index.html\" title=\"The Dirty Little Secrets of Getting Your Dream Job\">The Dirty Little Secrets ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£33.34</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"the-coming-woman-a-novel-based-on-the-life-of-the-infamous-feminist-victoria-woodhull_993/index.html\"><img src=\"../media/cache/3d/54/3d54940e57e662c4dd1f3ff00c78cc64.jpg\" alt=\"The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Three\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"the-coming-woman-a-novel-based-on-the-life-of-the-infamous-feminist-victoria-woodhull_993/index.html\" title=\"The Coming Woman: A Novel Based on the Life of the Infamous Feminist, Victoria Woodhull\">The Coming Woman: A ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£17.93</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"the-boys-in-the-boat-nine-americans-and-their-epic-quest-for-gold-at-the-1936-berlin-olympics_992/index.html\"><img src=\"../media/cache/66/88/66883b91f6804b2323c8369331cb7dd1.jpg\" alt=\"The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Four\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"the-boys-in-the-boat-nine-americans-and-their-epic-quest-for-gold-at-the-1936-berlin-olympics_992/index.html\" title=\"The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the 1936 Berlin Olympics\">The Boys in the ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£22.60</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"the-black-maria_991/index.html\"><img src=\"../media/cache/58/46/5846057e28022268153beff6d352b06c.jpg\" alt=\"The Black Maria\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating One\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"the-black-maria_991/index.html\" title=\"The Black Maria\">The Black Maria</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£52.15</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"starving-hearts-triangular-trade-trilogy-1_990/index.html\"><img src=\"../media/cache/be/f4/bef44da28c98f905a3ebec0b87be8530.jpg\" alt=\"Starving Hearts (Triangular Trade Trilogy, #1)\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Two\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"starving-hearts-triangular-trade-trilogy-1_990/index.html\" title=\"Starving Hearts (Triangular Trade Trilogy, #1)\">Starving Hearts (Triangular Trade ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£13.99</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"shakespeares-sonnets_989/index.html\"><img src=\"../media/cache/10/48/1048f63d3b5061cd2f424d20b3f9b666.jpg\" alt=\"Shakespeare's Sonnets\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Four\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"shakespeares-sonnets_989/index.html\" title=\"Shakespeare's Sonnets\">Shakespeare's Sonnets</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£20.66</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"set-me-free_988/index.html\"><img src=\"../media/cache/5b/88/5b88c52633f53cacf162c15f4f823153.jpg\" alt=\"Set Me Free\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Five\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"set-me-free_988/index.html\" title=\"Set Me Free\">Set Me Free</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£17.46</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"scott-pilgrims-precious-little-life-scott-pilgrim-1_987/index.html\"><img src=\"../media/cache/94/b1/94b1b8b244bce9677c2f29ccc890d4d2.jpg\" alt=\"Scott Pilgrim's Precious Little Life (Scott Pilgrim #1)\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Five\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"scott-pilgrims-precious-little-life-scott-pilgrim-1_987/index.html\" title=\"Scott Pilgrim's Precious Little Life (Scott Pilgrim #1)\">Scott Pilgrim's Precious Little ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£52.29</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"rip-it-up-and-start-again_986/index.html\"><img src=\"../media/cache/81/c4/81c4a973364e17d01f217e1188253d5e.jpg\" alt=\"Rip it Up and Start Again\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Five\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"rip-it-up-and-start-again_986/index.html\" title=\"Rip it Up and Start Again\">Rip it Up and ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£35.02</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"our-band-could-be-your-life-scenes-from-the-american-indie-underground-1981-1991_985/index.html\"><img src=\"../media/cache/54/60/54607fe8945897cdcced0044103b10b6.jpg\" alt=\"Our Band Could Be Your Life: Scenes from the American Indie Underground, 1981-1991\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Three\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"our-band-could-be-your-life-scenes-from-the-american-indie-underground-1981-1991_985/index.html\" title=\"Our Band Could Be Your Life: Scenes from the American Indie Underground, 1981-1991\">Our Band Could Be ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£57.25</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"olio_984/index.html\"><img src=\"../media/cache/55/33/553310a7162dfbc2c6d19a84da0df9e1.jpg\" alt=\"Olio\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating One\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"olio_984/index.html\" title=\"Olio\">Olio</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£23.88</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"mesaerion-the-best-science-fiction-stories-1800-1849_983/index.html\"><img src=\"../media/cache/09/a3/09a3aef48557576e1a85ba7efea8ecb7.jpg\" alt=\"Mesaerion: The Best Science Fiction Stories 1800-1849\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating One\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"mesaerion-the-best-science-fiction-stories-1800-1849_983/index.html\" title=\"Mesaerion: The Best Science Fiction Stories 1800-1849\">Mesaerion: The Best Science ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£37.59</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"libertarianism-for-beginners_982/index.html\"><img src=\"../media/cache/0b/bc/0bbcd0a6f4bcd81ccb1049a52736406e.jpg\" alt=\"Libertarianism for Beginners\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Two\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"libertarianism-for-beginners_982/index.html\" title=\"Libertarianism for Beginners\">Libertarianism for Beginners</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£51.33</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"its-only-the-himalayas_981/index.html\"><img src=\"../media/cache/27/a5/27a53d0bb95bdd88288eaf66c9230d7e.jpg\" alt=\"It's Only the Himalayas\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Two\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"its-only-the-himalayas_981/index.html\" title=\"It's Only the Himalayas\">It's Only the Himalayas</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£45.17</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n </ol>\n \n\n\n\n <div>\n <ul class=\"pager\">\n \n <li class=\"current\">\n \n Page 1 of 50\n \n </li>\n \n <li class=\"next\"><a href=\"page-2.html\">next</a></li>\n \n </ul>\n </div>\n\n\n </div>\n </section>\n \n\n\n </div>\n\n </div><!-- /row -->\n </div><!-- /page_inner -->\n</div><!-- /container-fluid -->\n\n\n \n<footer class=\"footer container-fluid\">\n \n \n \n</footer>\n\n\n \n \n \n <!-- jQuery -->\n <script src=\"http://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js\"></script>\n <script>window.jQuery || document.write('<script src=\"../static/oscar/js/jquery/jquery-1.9.1.min.js\"><\\/script>')</script><script src=\"../static/oscar/js/jquery/jquery-1.9.1.min.js\"></script>\n \n \n\n\n \n \n \n \n <script type=\"text/javascript\" src=\"../static/oscar/js/bootstrap3/bootstrap.min.js\"></script>\n <!-- Oscar -->\n <script src=\"../static/oscar/js/oscar/ui.js\" type=\"text/javascript\" charset=\"utf-8\"></script>\n\n <script src=\"../static/oscar/js/bootstrap-datetimepicker/bootstrap-datetimepicker.js\" type=\"text/javascript\" charset=\"utf-8\"></script>\n <script src=\"../static/oscar/js/bootstrap-datetimepicker/locales/bootstrap-datetimepicker.all.js\" type=\"text/javascript\" charset=\"utf-8\"></script>\n\n\n \n \n \n\n \n\n\n \n <script type=\"text/javascript\">\n $(function() {\n \n \n \n oscar.init();\n\n oscar.search.init();\n\n });\n </script>\n\n \n <!-- Version: N/A -->\n \n \n\n</body></html>", "echoData": "https://books.toscrape.com/catalogue/page-1.html"}
{"url": "https://books.toscrape.com/catalogue/page-2.html", "statusCode": 200, "browserHtml": "<!DOCTYPE html><!--[if lt IE 7]> <html lang=\"en-us\" class=\"no-js lt-ie9 lt-ie8 lt-ie7\"> <![endif]--><!--[if IE 7]> <html lang=\"en-us\" class=\"no-js lt-ie9 lt-ie8\"> <![endif]--><!--[if IE 8]> <html lang=\"en-us\" class=\"no-js lt-ie9\"> <![endif]--><!--[if gt IE 8]><!--><html lang=\"en-us\" class=\"no-js\"><!--<![endif]--><head>\n <title>\n All products | Books to Scrape - Sandbox\n</title>\n\n <meta http-equiv=\"content-type\" content=\"text/html; charset=UTF-8\">\n <meta name=\"created\" content=\"24th Jun 2016 09:29\">\n <meta name=\"description\" content=\"\">\n <meta name=\"viewport\" content=\"width=device-width\">\n <meta name=\"robots\" content=\"NOARCHIVE,NOCACHE\">\n\n <!-- Le HTML5 shim, for IE6-8 support of HTML elements -->\n <!--[if lt IE 9]>\n <script src=\"//html5shim.googlecode.com/svn/trunk/html5.js\"></script>\n <![endif]-->\n\n \n <link rel=\"shortcut icon\" href=\"../static/oscar/favicon.ico\">\n \n\n \n \n \n \n <link rel=\"stylesheet\" type=\"text/css\" href=\"../static/oscar/css/styles.css\">\n \n <link rel=\"stylesheet\" href=\"../static/oscar/js/bootstrap-datetimepicker/bootstrap-datetimepicker.css\">\n <link rel=\"stylesheet\" type=\"text/css\" href=\"../static/oscar/css/datetimepicker.css\">\n\n\n \n \n\n \n\n \n \n \n\n \n </head>\n\n <body id=\"default\" class=\"default\">\n \n \n \n \n <header class=\"header container-fluid\">\n <div class=\"page_inner\">\n <div class=\"row\">\n <div class=\"col-sm-8 h1\"><a href=\"../index.html\">Books to Scrape</a><small> We love being scraped!</small>\n</div>\n\n \n </div>\n </div>\n </header>\n\n \n \n<div class=\"container-fluid page\">\n <div class=\"page_inner\">\n \n <ul class=\"breadcrumb\">\n <li>\n <a href=\"../index.html\">Home</a>\n </li>\n <li class=\"active\">All products</li>\n </ul>\n\n <div class=\"row\">\n\n <aside class=\"sidebar col-sm-4 col-md-3\">\n \n <div id=\"promotions_left\">\n \n </div>\n \n \n \n \n <div class=\"side_categories\">\n <ul class=\"nav nav-list\">\n \n <li>\n <a href=\"category/books_1/index.html\">\n \n Books\n \n </a>\n\n <ul>\n \n \n <li>\n <a href=\"category/books/travel_2/index.html\">\n \n Travel\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/mystery_3/index.html\">\n \n Mystery\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/historical-fiction_4/index.html\">\n \n Historical Fiction\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/sequential-art_5/index.html\">\n \n Sequential Art\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/classics_6/index.html\">\n \n Classics\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/philosophy_7/index.html\">\n \n Philosophy\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/romance_8/index.html\">\n \n Romance\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/womens-fiction_9/index.html\">\n \n Womens Fiction\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/fiction_10/index.html\">\n \n Fiction\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/childrens_11/index.html\">\n \n Childrens\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/religion_12/index.html\">\n \n Religion\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/nonfiction_13/index.html\">\n \n Nonfiction\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/music_14/index.html\">\n \n Music\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/default_15/index.html\">\n \n Default\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/science-fiction_16/index.html\">\n \n Science Fiction\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/sports-and-games_17/index.html\">\n \n Sports and Games\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/add-a-comment_18/index.html\">\n \n Add a comment\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/fantasy_19/index.html\">\n \n Fantasy\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/new-adult_20/index.html\">\n \n New Adult\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/young-adult_21/index.html\">\n \n Young Adult\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/science_22/index.html\">\n \n Science\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/poetry_23/index.html\">\n \n Poetry\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/paranormal_24/index.html\">\n \n Paranormal\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/art_25/index.html\">\n \n Art\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/psychology_26/index.html\">\n \n Psychology\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/autobiography_27/index.html\">\n \n Autobiography\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/parenting_28/index.html\">\n \n Parenting\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/adult-fiction_29/index.html\">\n \n Adult Fiction\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/humor_30/index.html\">\n \n Humor\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/horror_31/index.html\">\n \n Horror\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/history_32/index.html\">\n \n History\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/food-and-drink_33/index.html\">\n \n Food and Drink\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/christian-fiction_34/index.html\">\n \n Christian Fiction\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/business_35/index.html\">\n \n Business\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/biography_36/index.html\">\n \n Biography\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/thriller_37/index.html\">\n \n Thriller\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/contemporary_38/index.html\">\n \n Contemporary\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/spirituality_39/index.html\">\n \n Spirituality\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/academic_40/index.html\">\n \n Academic\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/self-help_41/index.html\">\n \n Self Help\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/historical_42/index.html\">\n \n Historical\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/christian_43/index.html\">\n \n Christian\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/suspense_44/index.html\">\n \n Suspense\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/short-stories_45/index.html\">\n \n Short Stories\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/novels_46/index.html\">\n \n Novels\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/health_47/index.html\">\n \n Health\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/politics_48/index.html\">\n \n Politics\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/cultural_49/index.html\">\n \n Cultural\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/erotica_50/index.html\">\n \n Erotica\n \n </a>\n\n </li>\n \n \n <li>\n <a href=\"category/books/crime_51/index.html\">\n \n Crime\n \n </a>\n\n </li>\n \n </ul></li>\n \n \n </ul>\n </div>\n \n \n\n </aside>\n\n <div class=\"col-sm-8 col-md-9\">\n \n <div class=\"page-header action\">\n <h1>All products</h1>\n </div>\n \n\n \n\n\n\n<div id=\"messages\">\n\n</div>\n\n\n <div id=\"promotions\">\n \n </div>\n\n \n <form method=\"get\" class=\"form-horizontal\">\n \n <div style=\"display:none\">\n \n \n </div>\n\n \n \n \n <strong>1000</strong> results - showing <strong>21</strong> to <strong>40</strong>.\n \n \n \n \n </form>\n \n <section>\n <div class=\"alert alert-warning\" role=\"alert\"><strong>Warning!</strong> This is a demo website for web scraping purposes. Prices and ratings here were randomly assigned and have no real meaning.</div>\n\n <div>\n <ol class=\"row\">\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"in-her-wake_980/index.html\"><img src=\"../media/cache/5d/72/5d72709c6a7a9584a4d1cf07648bfce1.jpg\" alt=\"In Her Wake\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating One\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"in-her-wake_980/index.html\" title=\"In Her Wake\">In Her Wake</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£12.84</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"how-music-works_979/index.html\"><img src=\"../media/cache/5c/c8/5cc8e107246cb478960d4f0aba1e1c8e.jpg\" alt=\"How Music Works\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Two\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"how-music-works_979/index.html\" title=\"How Music Works\">How Music Works</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£37.32</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"foolproof-preserving-a-guide-to-small-batch-jams-jellies-pickles-condiments-and-more-a-foolproof-guide-to-making-small-batch-jams-jellies-pickles-condiments-and-more_978/index.html\"><img src=\"../media/cache/9f/59/9f59f01fa916a7bb8f0b28a4012179a4.jpg\" alt=\"Foolproof Preserving: A Guide to Small Batch Jams, Jellies, Pickles, Condiments, and More: A Foolproof Guide to Making Small Batch Jams, Jellies, Pickles, Condiments, and More\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Three\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"foolproof-preserving-a-guide-to-small-batch-jams-jellies-pickles-condiments-and-more-a-foolproof-guide-to-making-small-batch-jams-jellies-pickles-condiments-and-more_978/index.html\" title=\"Foolproof Preserving: A Guide to Small Batch Jams, Jellies, Pickles, Condiments, and More: A Foolproof Guide to Making Small Batch Jams, Jellies, Pickles, Condiments, and More\">Foolproof Preserving: A Guide ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£30.52</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"chase-me-paris-nights-2_977/index.html\"><img src=\"../media/cache/9c/2e/9c2e0eb8866b8e3f3b768994fd3d1c1a.jpg\" alt=\"Chase Me (Paris Nights #2)\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Five\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"chase-me-paris-nights-2_977/index.html\" title=\"Chase Me (Paris Nights #2)\">Chase Me (Paris Nights ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£25.27</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"black-dust_976/index.html\"><img src=\"../media/cache/44/cc/44ccc99c8f82c33d4f9d2afa4ef25787.jpg\" alt=\"Black Dust\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Five\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"black-dust_976/index.html\" title=\"Black Dust\">Black Dust</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£34.53</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"birdsong-a-story-in-pictures_975/index.html\"><img src=\"../media/cache/af/6e/af6e796160fe63e0cf19d44395c7ddf2.jpg\" alt=\"Birdsong: A Story in Pictures\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Three\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"birdsong-a-story-in-pictures_975/index.html\" title=\"Birdsong: A Story in Pictures\">Birdsong: A Story in ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£54.64</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"americas-cradle-of-quarterbacks-western-pennsylvanias-football-factory-from-johnny-unitas-to-joe-montana_974/index.html\"><img src=\"../media/cache/ef/0b/ef0bed08de4e083dba5e20fdb98d9c36.jpg\" alt=\"America's Cradle of Quarterbacks: Western Pennsylvania's Football Factory from Johnny Unitas to Joe Montana\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Three\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"americas-cradle-of-quarterbacks-western-pennsylvanias-football-factory-from-johnny-unitas-to-joe-montana_974/index.html\" title=\"America's Cradle of Quarterbacks: Western Pennsylvania's Football Factory from Johnny Unitas to Joe Montana\">America's Cradle of Quarterbacks: ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£22.50</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"aladdin-and-his-wonderful-lamp_973/index.html\"><img src=\"../media/cache/d6/da/d6da0371958068bbaf39ea9c174275cd.jpg\" alt=\"Aladdin and His Wonderful Lamp\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Three\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"aladdin-and-his-wonderful-lamp_973/index.html\" title=\"Aladdin and His Wonderful Lamp\">Aladdin and His Wonderful ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£53.13</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"worlds-elsewhere-journeys-around-shakespeares-globe_972/index.html\"><img src=\"../media/cache/2e/98/2e98c332bf8563b584784971541c4445.jpg\" alt=\"Worlds Elsewhere: Journeys Around Shakespeare’s Globe\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Five\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"worlds-elsewhere-journeys-around-shakespeares-globe_972/index.html\" title=\"Worlds Elsewhere: Journeys Around Shakespeare’s Globe\">Worlds Elsewhere: Journeys Around ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£40.30</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"wall-and-piece_971/index.html\"><img src=\"../media/cache/a5/41/a5416b9646aaa7287baa287ec2590270.jpg\" alt=\"Wall and Piece\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Four\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"wall-and-piece_971/index.html\" title=\"Wall and Piece\">Wall and Piece</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£44.18</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"the-four-agreements-a-practical-guide-to-personal-freedom_970/index.html\"><img src=\"../media/cache/0f/7e/0f7ee69495c0df1d35723f012624a9f8.jpg\" alt=\"The Four Agreements: A Practical Guide to Personal Freedom\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Five\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"the-four-agreements-a-practical-guide-to-personal-freedom_970/index.html\" title=\"The Four Agreements: A Practical Guide to Personal Freedom\">The Four Agreements: A ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£17.66</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"the-five-love-languages-how-to-express-heartfelt-commitment-to-your-mate_969/index.html\"><img src=\"../media/cache/38/c5/38c56fba316c07305643a8065269594e.jpg\" alt=\"The Five Love Languages: How to Express Heartfelt Commitment to Your Mate\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Three\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"the-five-love-languages-how-to-express-heartfelt-commitment-to-your-mate_969/index.html\" title=\"The Five Love Languages: How to Express Heartfelt Commitment to Your Mate\">The Five Love Languages: ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£31.05</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"the-elephant-tree_968/index.html\"><img src=\"../media/cache/5d/7e/5d7ecde8e81513eba8a64c9fe000744b.jpg\" alt=\"The Elephant Tree\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Five\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"the-elephant-tree_968/index.html\" title=\"The Elephant Tree\">The Elephant Tree</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£23.82</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"the-bear-and-the-piano_967/index.html\"><img src=\"../media/cache/cf/bb/cfbb5e62715c6d888fd07794c9bab5d6.jpg\" alt=\"The Bear and the Piano\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating One\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"the-bear-and-the-piano_967/index.html\" title=\"The Bear and the Piano\">The Bear and the ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£36.89</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"sophies-world_966/index.html\"><img src=\"../media/cache/65/71/6571919836ec51ed54f0050c31d8a0cd.jpg\" alt=\"Sophie's World\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Five\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"sophies-world_966/index.html\" title=\"Sophie's World\">Sophie's World</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£15.94</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"penny-maybe_965/index.html\"><img src=\"../media/cache/12/53/1253c21c5ef3c6d075c5fa3f5fecee6a.jpg\" alt=\"Penny Maybe\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Three\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"penny-maybe_965/index.html\" title=\"Penny Maybe\">Penny Maybe</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£33.29</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"maude-1883-1993she-grew-up-with-the-country_964/index.html\"><img src=\"../media/cache/f5/88/f5889d038f5d8e949b494d147c2dcf54.jpg\" alt=\"Maude (1883-1993):She Grew Up with the country\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Two\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"maude-1883-1993she-grew-up-with-the-country_964/index.html\" title=\"Maude (1883-1993):She Grew Up with the country\">Maude (1883-1993):She Grew Up ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£18.02</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"in-a-dark-dark-wood_963/index.html\"><img src=\"../media/cache/23/85/238570a1c284e730dbc737a7e631ae2b.jpg\" alt=\"In a Dark, Dark Wood\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating One\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"in-a-dark-dark-wood_963/index.html\" title=\"In a Dark, Dark Wood\">In a Dark, Dark ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£19.63</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"behind-closed-doors_962/index.html\"><img src=\"../media/cache/e1/5c/e15c289ba58cea38519e1281e859f0c1.jpg\" alt=\"Behind Closed Doors\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Four\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"behind-closed-doors_962/index.html\" title=\"Behind Closed Doors\">Behind Closed Doors</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£52.22</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n <li class=\"col-xs-6 col-sm-4 col-md-3 col-lg-3\">\n\n\n\n\n\n\n <article class=\"product_pod\">\n \n <div class=\"image_container\">\n \n \n <a href=\"you-cant-bury-them-all-poems_961/index.html\"><img src=\"../media/cache/e9/20/e9203b733126c4a0832a1c7885dc27cf.jpg\" alt=\"You can't bury them all: Poems\" class=\"thumbnail\"></a>\n \n \n </div>\n \n\n \n \n <p class=\"star-rating Two\">\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n <i class=\"icon-star\"></i>\n </p>\n \n \n\n \n <h3><a href=\"you-cant-bury-them-all-poems_961/index.html\" title=\"You can't bury them all: Poems\">You can't bury them ...</a></h3>\n \n\n \n <div class=\"product_price\">\n \n\n\n\n\n\n\n \n <p class=\"price_color\">£33.63</p>\n \n\n<p class=\"instock availability\">\n <i class=\"icon-ok\"></i>\n \n In stock\n \n</p>\n\n \n \n\n\n\n\n\n\n \n <form>\n <button type=\"submit\" class=\"btn btn-primary btn-block\" data-loading-text=\"Adding...\">Add to basket</button>\n </form>\n\n\n \n </div>\n \n </article>\n\n</li>\n \n </ol>\n \n\n\n\n <div>\n <ul class=\"pager\">\n \n <li class=\"previous\"><a href=\"page-1.html\">previous</a></li>\n \n <li class=\"current\">\n \n Page 2 of 50\n \n </li>\n \n <li class=\"next\"><a href=\"page-3.html\">next</a></li>\n \n </ul>\n </div>\n\n\n </div>\n </section>\n \n\n\n </div>\n\n </div><!-- /row -->\n </div><!-- /page_inner -->\n</div><!-- /container-fluid -->\n\n\n \n<footer class=\"footer container-fluid\">\n \n \n \n</footer>\n\n\n \n \n \n <!-- jQuery -->\n <script src=\"http://ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js\"></script>\n <script>window.jQuery || document.write('<script src=\"../static/oscar/js/jquery/jquery-1.9.1.min.js\"><\\/script>')</script><script src=\"../static/oscar/js/jquery/jquery-1.9.1.min.js\"></script>\n \n \n\n\n \n \n \n \n <script type=\"text/javascript\" src=\"../static/oscar/js/bootstrap3/bootstrap.min.js\"></script>\n <!-- Oscar -->\n <script src=\"../static/oscar/js/oscar/ui.js\" type=\"text/javascript\" charset=\"utf-8\"></script>\n\n <script src=\"../static/oscar/js/bootstrap-datetimepicker/bootstrap-datetimepicker.js\" type=\"text/javascript\" charset=\"utf-8\"></script>\n <script src=\"../static/oscar/js/bootstrap-datetimepicker/locales/bootstrap-datetimepicker.all.js\" type=\"text/javascript\" charset=\"utf-8\"></script>\n\n\n \n \n \n\n \n\n\n \n <script type=\"text/javascript\">\n $(function() {\n \n \n \n oscar.init();\n\n oscar.search.init();\n\n });\n </script>\n\n \n <!-- Version: N/A -->\n \n \n\n</body></html>", "echoData": "https://books.toscrape.com/catalogue/page-2.html"}
Getting browser HTML in proxy
mode
Note
Install and configure code example requirements and the Zyte CA certificate to run the example below.
curl \
--proxy api.zyte.com:8011 \
--proxy-user YOUR_API_KEY: \
--compressed \
-H "Zyte-Browser-Html: true" \
https://toscrape.com
using System;
using System.Net;
using System.Net.Http;
var proxy = new WebProxy("http://api.zyte.com:8011", true);
proxy.Credentials = new NetworkCredential("YOUR_API_KEY", "");
var httpClientHandler = new HttpClientHandler
{
Proxy = proxy,
};
var client = new HttpClient(handler: httpClientHandler, disposeHandler: true);
client.DefaultRequestHeaders.Add("Zyte-Browser-Html", "true");
var message = new HttpRequestMessage(HttpMethod.Get, "https://toscrape.com");
var response = client.Send(message);
var body = await response.Content.ReadAsStringAsync();
Console.WriteLine(body);
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import org.apache.hc.client5.http.auth.AuthCache;
import org.apache.hc.client5.http.auth.AuthScope;
import org.apache.hc.client5.http.auth.CredentialsProvider;
import org.apache.hc.client5.http.classic.methods.HttpGet;
import org.apache.hc.client5.http.impl.auth.BasicAuthCache;
import org.apache.hc.client5.http.impl.auth.BasicScheme;
import org.apache.hc.client5.http.impl.auth.CredentialsProviderBuilder;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.client5.http.impl.routing.DefaultProxyRoutePlanner;
import org.apache.hc.client5.http.protocol.HttpClientContext;
import org.apache.hc.core5.http.HttpEntity;
import org.apache.hc.core5.http.HttpHost;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.io.entity.EntityUtils;
class Example {
public static void main(final String[] args)
throws InterruptedException, IOException, ParseException {
HttpHost proxy = new HttpHost("api.zyte.com", 8011);
DefaultProxyRoutePlanner routePlanner = new DefaultProxyRoutePlanner(proxy);
CredentialsProvider credentialsProvider =
CredentialsProviderBuilder.create()
.add(new AuthScope(proxy), "YOUR_API_KEY", "".toCharArray())
.build();
AuthCache authCache = new BasicAuthCache();
BasicScheme basicAuth = new BasicScheme();
authCache.put(proxy, basicAuth);
HttpClientContext context = HttpClientContext.create();
context.setCredentialsProvider(credentialsProvider);
context.setAuthCache(authCache);
CloseableHttpClient client =
HttpClients.custom()
.setRoutePlanner(routePlanner)
.setDefaultCredentialsProvider(credentialsProvider)
.build();
HttpGet request = new HttpGet("https://toscrape.com");
request.setHeader("Zyte-Browser-Html", "true");
client.execute(
request,
response -> {
HttpEntity entity = response.getEntity();
String httpResponseBody = EntityUtils.toString(entity, StandardCharsets.UTF_8);
System.out.println(httpResponseBody);
return null;
});
}
}
const axios = require('axios')
axios
.get(
'https://toscrape.com',
{
headers: {
'Zyte-Browser-Html': 'true'
},
proxy: {
protocol: 'http',
host: 'api.zyte.com',
port: 8011,
auth: {
username: 'YOUR_API_KEY',
password: ''
}
}
}
)
.then((response) => {
const httpResponseBody = response.data
console.log(httpResponseBody)
})
<?php
$client = new GuzzleHttp\Client();
$response = $client->request('GET', 'https://toscrape.com', [
'headers' => [
'Zyte-Browser-Html' => 'true',
],
'proxy' => 'http://YOUR_API_KEY:@api.zyte.com:8011',
]);
$http_response_body = (string) $response->getBody();
fwrite(STDOUT, $http_response_body);
import requests
response = requests.get(
"https://toscrape.com",
headers={
"Zyte-Browser-Html": "true",
},
proxies={
scheme: "http://YOUR_API_KEY:@api.zyte.com:8011" for scheme in ("http", "https")
},
)
http_response_body: bytes = response.content
print(http_response_body.decode())
# frozen_string_literal: true
require 'net/http'
url = URI('https://toscrape.com/')
proxy_host = 'api.zyte.com'
proxy_port = '8011'
http = Net::HTTP.new(url.host, url.port, proxy_host, proxy_port, 'YOUR_API_KEY', '')
http.use_ssl = true
request = Net::HTTP::Get.new(url)
request['Zyte-Browser-Html'] = 'true'
r = http.start do |h|
h.request(request)
end
puts r.body
from scrapy import Request, Spider
class ToScrapeSpider(Spider):
name = "toscrape_com"
def start_requests(self):
yield Request("https://toscrape.com", headers={"Zyte-Browser-Html": "true"})
def parse(self, response):
print(response.text)
Output (first 5 lines):
<!DOCTYPE html><html lang="en"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Scraping Sandbox</title>
<link href="./css/bootstrap.min.css" rel="stylesheet">
<link href="./css/main.css" rel="stylesheet">
Using proxy mode
using System;
using System.Net;
using System.Net.Http;
var proxy = new WebProxy("http://api.zyte.com:8011", true);
proxy.Credentials = new NetworkCredential("YOUR_API_KEY", "");
var httpClientHandler = new HttpClientHandler
{
Proxy = proxy,
};
var client = new HttpClient(handler: httpClientHandler, disposeHandler: true);
var message = new HttpRequestMessage(HttpMethod.Get, "https://toscrape.com");
var response = client.Send(message);
var body = await response.Content.ReadAsStringAsync();
Console.WriteLine(body);
curl \
--proxy api.zyte.com:8011 \
--proxy-user YOUR_API_KEY: \
--compressed \
https://toscrape.com
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import org.apache.hc.client5.http.auth.AuthCache;
import org.apache.hc.client5.http.auth.AuthScope;
import org.apache.hc.client5.http.auth.CredentialsProvider;
import org.apache.hc.client5.http.classic.methods.HttpGet;
import org.apache.hc.client5.http.impl.auth.BasicAuthCache;
import org.apache.hc.client5.http.impl.auth.BasicScheme;
import org.apache.hc.client5.http.impl.auth.CredentialsProviderBuilder;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.client5.http.impl.routing.DefaultProxyRoutePlanner;
import org.apache.hc.client5.http.protocol.HttpClientContext;
import org.apache.hc.core5.http.HttpEntity;
import org.apache.hc.core5.http.HttpHost;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.io.entity.EntityUtils;
class Example {
public static void main(final String[] args)
throws InterruptedException, IOException, ParseException {
HttpHost proxy = new HttpHost("api.zyte.com", 8011);
DefaultProxyRoutePlanner routePlanner = new DefaultProxyRoutePlanner(proxy);
CredentialsProvider credentialsProvider =
CredentialsProviderBuilder.create()
.add(new AuthScope(proxy), "YOUR_API_KEY", "".toCharArray())
.build();
AuthCache authCache = new BasicAuthCache();
BasicScheme basicAuth = new BasicScheme();
authCache.put(proxy, basicAuth);
HttpClientContext context = HttpClientContext.create();
context.setCredentialsProvider(credentialsProvider);
context.setAuthCache(authCache);
CloseableHttpClient client =
HttpClients.custom()
.setRoutePlanner(routePlanner)
.setDefaultCredentialsProvider(credentialsProvider)
.build();
HttpGet request = new HttpGet("https://toscrape.com");
client.execute(
request,
response -> {
HttpEntity entity = response.getEntity();
String httpResponseBody = EntityUtils.toString(entity, StandardCharsets.UTF_8);
System.out.println(httpResponseBody);
return null;
});
}
}
const axios = require('axios')
axios
.get(
'https://toscrape.com',
{
proxy: {
protocol: 'http',
host: 'api.zyte.com',
port: 8011,
auth: {
username: 'YOUR_API_KEY',
password: ''
}
}
}
)
.then((response) => {
const httpResponseBody = response.data
console.log(httpResponseBody)
})
<?php
$client = new GuzzleHttp\Client();
$response = $client->request('GET', 'https://toscrape.com', [
'proxy' => 'http://YOUR_API_KEY:@api.zyte.com:8011',
]);
$http_response_body = (string) $response->getBody();
fwrite(STDOUT, $http_response_body);
Note
You need to install and configure our CA certificate for the requests library.
import requests
response = requests.get(
"https://toscrape.com",
proxies={
scheme: "http://YOUR_API_KEY:@api.zyte.com:8011" for scheme in ("http", "https")
},
)
http_response_body: bytes = response.content
print(http_response_body.decode())
# frozen_string_literal: true
require 'net/http'
url = URI('https://toscrape.com/')
proxy_host = 'api.zyte.com'
proxy_port = '8011'
http = Net::HTTP.new(url.host, url.port, proxy_host, proxy_port, 'YOUR_API_KEY', '')
http.use_ssl = true
r = http.start do |h|
h.request(Net::HTTP::Get.new(url))
end
puts r.body
When using scrapy-zyte-smartproxy, set the ZYTE_SMARTPROXY_URL
setting to "http://api.zyte.com:8011"
and the
ZYTE_SMARTPROXY_APIKEY
setting to your API key for Zyte API.
Then you can continue using Scrapy as usual and all requests will be proxied through Zyte API automatically.
from scrapy import Spider
class ToScrapeSpider(Spider):
name = "toscrape_com"
start_urls = ["https://toscrape.com"]
def parse(self, response):
print(response.text)
Using the HTTPS endpoint of proxy mode
curl \
--proxy https://api.zyte.com:8014 \
--proxy-user YOUR_API_KEY: \
--compressed \
https://toscrape.com
const HttpsProxyAgent = require('https-proxy-agent')
const httpsAgent = new HttpsProxyAgent.HttpsProxyAgent('https://YOUR_API_KEY:@api.zyte.com:8014')
const axiosDefaultConfig = { httpsAgent }
const axios = require('axios').create(axiosDefaultConfig)
axios
.get('https://toscrape.com')
.then((response) => {
const httpResponseBody = response.data
console.log(httpResponseBody)
})
import requests
response = requests.get(
"https://toscrape.com",
proxies={
scheme: "https://YOUR_API_KEY:@api.zyte.com:8014"
for scheme in ("http", "https")
},
)
http_response_body: bytes = response.content
print(http_response_body.decode())
Sending arbitrary bytes in an HTTP
request
Note
Install and configure code example requirements and the Zyte CA certificate to run the example below.
using System;
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.All
};
HttpClient client = new HttpClient(handler);
var apiKey = "YOUR_API_KEY";
var bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(apiKey + ":");
var auth = System.Convert.ToBase64String(bytes);
client.DefaultRequestHeaders.Add("Authorization", "Basic " + auth);
client.DefaultRequestHeaders.Add("Accept-Encoding", "br, gzip, deflate");
var input = new Dictionary<string, object>(){
{"url", "https://httpbin.org/anything"},
{"httpResponseBody", true},
{"httpRequestMethod", "POST"},
{"httpRequestBody", "Zm9v"}
};
var inputJson = JsonSerializer.Serialize(input);
var content = new StringContent(inputJson, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("https://api.zyte.com/v1/extract", content);
var body = await response.Content.ReadAsByteArrayAsync();
var data = JsonDocument.Parse(body);
var base64HttpResponseBody = data.RootElement.GetProperty("httpResponseBody").ToString();
var httpResponseBody = System.Convert.FromBase64String(base64HttpResponseBody);
var responseData = JsonDocument.Parse(httpResponseBody);
var requestBody = responseData.RootElement.GetProperty("data").ToString();
Console.WriteLine(requestBody);
{"url": "https://httpbin.org/anything", "httpResponseBody": true, "httpRequestMethod": "POST", "httpRequestBody": "Zm9v"}
zyte-api input.jsonl \
| jq --raw-output .httpResponseBody \
| base64 --decode \
| jq --raw-output .data
{
"url": "https://httpbin.org/anything",
"httpResponseBody": true,
"httpRequestMethod": "POST",
"httpRequestBody": "Zm9v"
}
curl \
--user YOUR_API_KEY: \
--header 'Content-Type: application/json' \
--data @input.json \
--compressed \
https://api.zyte.com/v1/extract \
| jq --raw-output .httpResponseBody \
| base64 --decode \
| jq --raw-output .data
import com.google.common.collect.ImmutableMap;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Map;
import org.apache.hc.client5.http.classic.methods.HttpPost;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.ContentType;
import org.apache.hc.core5.http.HttpEntity;
import org.apache.hc.core5.http.HttpHeaders;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.io.entity.EntityUtils;
import org.apache.hc.core5.http.io.entity.StringEntity;
class Example {
private static final String API_KEY = "YOUR_API_KEY";
public static void main(final String[] args)
throws InterruptedException, IOException, ParseException {
Map<String, Object> parameters =
ImmutableMap.of(
"url",
"https://httpbin.org/anything",
"httpResponseBody",
true,
"httpRequestMethod",
"POST",
"httpRequestBody",
"Zm9v");
String requestBody = new Gson().toJson(parameters);
HttpPost request = new HttpPost("https://api.zyte.com/v1/extract");
request.setHeader(HttpHeaders.CONTENT_TYPE, ContentType.APPLICATION_JSON);
request.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
request.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
request.setEntity(new StringEntity(requestBody));
CloseableHttpClient client = HttpClients.createDefault();
client.execute(
request,
response -> {
HttpEntity entity = response.getEntity();
String apiResponse = EntityUtils.toString(entity, StandardCharsets.UTF_8);
JsonObject jsonObject = JsonParser.parseString(apiResponse).getAsJsonObject();
String base64HttpResponseBody = jsonObject.get("httpResponseBody").getAsString();
byte[] httpResponseBodyBytes = Base64.getDecoder().decode(base64HttpResponseBody);
String httpResponseBody = new String(httpResponseBodyBytes, StandardCharsets.UTF_8);
JsonObject data = JsonParser.parseString(httpResponseBody).getAsJsonObject();
String body = data.get("data").getAsString();
System.out.println(body);
return null;
});
}
private static String buildAuthHeader() {
String auth = API_KEY + ":";
String encodedAuth = Base64.getEncoder().encodeToString(auth.getBytes());
return "Basic " + encodedAuth;
}
}
const axios = require('axios')
axios.post(
'https://api.zyte.com/v1/extract',
{
url: 'https://httpbin.org/anything',
httpResponseBody: true,
httpRequestMethod: 'POST',
httpRequestBody: 'Zm9v'
},
{
auth: { username: 'YOUR_API_KEY' }
}
).then((response) => {
const httpResponseBody = Buffer.from(
response.data.httpResponseBody,
'base64'
)
const body = JSON.parse(httpResponseBody).data
console.log(body)
})
<?php
$client = new GuzzleHttp\Client();
$response = $client->request('POST', 'https://api.zyte.com/v1/extract', [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => 'https://httpbin.org/anything',
'httpResponseBody' => true,
'httpRequestMethod' => 'POST',
'httpRequestBody' => 'Zm9v',
],
]);
$data = json_decode($response->getBody());
$http_response_body = base64_decode($data->httpResponseBody);
$body = json_decode($http_response_body)->data;
echo $body.PHP_EOL;
With the proxy mode, the request body from your requests is used automatically, be it plain text or binary.
curl \
--proxy api.zyte.com:8011 \
--proxy-user YOUR_API_KEY: \
--compressed \
-X POST \
-H "Content-Type: application/octet-stream" \
--data foo \
https://httpbin.org/anything \
| jq .data
import json
from base64 import b64decode
import requests
api_response = requests.post(
"https://api.zyte.com/v1/extract",
auth=("YOUR_API_KEY", ""),
json={
"url": "https://httpbin.org/anything",
"httpResponseBody": True,
"httpRequestMethod": "POST",
"httpRequestBody": "Zm9v",
},
)
http_response_body = b64decode(api_response.json()["httpResponseBody"])
body: str = json.loads(http_response_body)["data"]
print(body)
import asyncio
import json
from base64 import b64decode
from zyte_api import AsyncZyteAPI
async def main():
client = AsyncZyteAPI()
api_response = await client.get(
{
"url": "https://httpbin.org/anything",
"httpResponseBody": True,
"httpRequestMethod": "POST",
"httpRequestBody": "Zm9v",
}
)
http_response_body: bytes = b64decode(api_response["httpResponseBody"])
body = json.loads(http_response_body)["data"]
print(body)
asyncio.run(main())
import json
from scrapy import Request, Spider
class HTTPBinOrgSpider(Spider):
name = "httpbin_org"
def start_requests(self):
yield Request(
"https://httpbin.org/anything",
method="POST",
body=b"foo",
)
def parse(self, response):
body = json.loads(response.body)["data"]
print(body)
Output:
foo
Sending cookies
Note
Install and configure code example requirements and the Zyte CA certificate to run the example below.
The following code example sends a cookie to httpbin.org and prints the cookies that httpbin.org reports to have received:
using System;
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.All
};
HttpClient client = new HttpClient(handler);
var apiKey = "YOUR_API_KEY";
var bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(apiKey + ":");
var auth = System.Convert.ToBase64String(bytes);
client.DefaultRequestHeaders.Add("Authorization", "Basic " + auth);
client.DefaultRequestHeaders.Add("Accept-Encoding", "br, gzip, deflate");
var input = new Dictionary<string, object>(){
{"url", "https://httpbin.org/cookies"},
{"httpResponseBody", true},
{
"requestCookies",
new List<Dictionary<string, string>>()
{
new Dictionary<string, string>()
{
{"name", "foo"},
{"value", "bar"},
{"domain", "httpbin.org"}
}
}
}
};
var inputJson = JsonSerializer.Serialize(input);
var content = new StringContent(inputJson, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("https://api.zyte.com/v1/extract", content);
var body = await response.Content.ReadAsByteArrayAsync();
var data = JsonDocument.Parse(body);
var base64HttpResponseBody = data.RootElement.GetProperty("httpResponseBody").ToString();
var httpResponseBody = System.Convert.FromBase64String(base64HttpResponseBody);
var result = System.Text.Encoding.UTF8.GetString(httpResponseBody);
Console.WriteLine(result);
{"url": "https://httpbin.org/cookies", "httpResponseBody": true, "requestCookies": [{"name": "foo", "value": "bar", "domain": "httpbin.org"}]}
zyte-api input.jsonl \
| jq --raw-output .httpResponseBody \
| base64 --decode
{
"url": "https://httpbin.org/cookies",
"httpResponseBody": true,
"requestCookies": [
{
"name": "foo",
"value": "bar",
"domain": "httpbin.org"
}
]
}
curl \
--user YOUR_API_KEY: \
--header 'Content-Type: application/json' \
--data @input.json \
--compressed \
https://api.zyte.com/v1/extract \
| jq --raw-output .httpResponseBody \
| base64 --decode
import com.google.common.collect.ImmutableMap;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Collections;
import java.util.Map;
import org.apache.hc.client5.http.classic.methods.HttpPost;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.ContentType;
import org.apache.hc.core5.http.HttpEntity;
import org.apache.hc.core5.http.HttpHeaders;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.io.entity.EntityUtils;
import org.apache.hc.core5.http.io.entity.StringEntity;
class Example {
private static final String API_KEY = "YOUR_API_KEY";
public static void main(final String[] args)
throws InterruptedException, IOException, ParseException {
Map<String, String> cookies =
ImmutableMap.of("name", "foo", "value", "bar", "domain", "httpbin.org");
Map<String, Object> parameters =
ImmutableMap.of(
"url",
"https://httpbin.org/cookies",
"httpResponseBody",
true,
"requestCookies",
Collections.singletonList(cookies));
String requestBody = new Gson().toJson(parameters);
HttpPost request = new HttpPost("https://api.zyte.com/v1/extract");
request.setHeader(HttpHeaders.CONTENT_TYPE, ContentType.APPLICATION_JSON);
request.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
request.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
request.setEntity(new StringEntity(requestBody));
CloseableHttpClient client = HttpClients.createDefault();
client.execute(
request,
response -> {
HttpEntity entity = response.getEntity();
String apiResponse = EntityUtils.toString(entity, StandardCharsets.UTF_8);
JsonObject jsonObject = JsonParser.parseString(apiResponse).getAsJsonObject();
String base64HttpResponseBody = jsonObject.get("httpResponseBody").getAsString();
byte[] httpResponseBodyBytes = Base64.getDecoder().decode(base64HttpResponseBody);
String httpResponseBody = new String(httpResponseBodyBytes, StandardCharsets.UTF_8);
System.out.println(httpResponseBody);
return null;
});
}
private static String buildAuthHeader() {
String auth = API_KEY + ":";
String encodedAuth = Base64.getEncoder().encodeToString(auth.getBytes());
return "Basic " + encodedAuth;
}
}
const axios = require('axios')
axios.post(
'https://api.zyte.com/v1/extract',
{
url: 'https://httpbin.org/cookies',
httpResponseBody: true,
requestCookies: [
{
name: 'foo',
value: 'bar',
domain: 'httpbin.org'
}
]
},
{
auth: { username: 'YOUR_API_KEY' }
}
).then((response) => {
const httpResponseBody = Buffer.from(
response.data.httpResponseBody,
'base64'
)
console.log(httpResponseBody.toString())
})
<?php
$client = new GuzzleHttp\Client();
$response = $client->request('POST', 'https://api.zyte.com/v1/extract', [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => 'https://httpbin.org/cookies',
'httpResponseBody' => true,
'requestCookies' => [
[
'name' => 'foo',
'value' => 'bar',
'domain' => 'httpbin.org',
],
],
],
]);
$api = json_decode($response->getBody());
$http_response_body = base64_decode($api->httpResponseBody);
echo $http_response_body;
With the proxy mode, the request
Cookie
header from your requests is used automatically to set
cookies for the target URL domain.
Note
Setting cookies for additional domains is not supported.
curl \
--proxy api.zyte.com:8011 \
--proxy-user YOUR_API_KEY: \
--compressed \
-H "Cookie: foo=bar" \
https://httpbin.org/cookies
from base64 import b64decode
import requests
api_response = requests.post(
"https://api.zyte.com/v1/extract",
auth=("YOUR_API_KEY", ""),
json={
"url": "https://httpbin.org/cookies",
"httpResponseBody": True,
"requestCookies": [
{
"name": "foo",
"value": "bar",
"domain": "httpbin.org",
},
],
},
)
http_response_body = b64decode(api_response.json()["httpResponseBody"])
print(http_response_body.decode())
import asyncio
from base64 import b64decode
from zyte_api import AsyncZyteAPI
async def main():
client = AsyncZyteAPI()
api_response = await client.get(
{
"url": "https://httpbin.org/cookies",
"httpResponseBody": True,
"requestCookies": [
{
"name": "foo",
"value": "bar",
"domain": "httpbin.org",
},
],
}
)
http_response_body = b64decode(api_response["httpResponseBody"]).decode()
print(http_response_body)
asyncio.run(main())
from scrapy import Request, Spider
class HTTPBinOrgSpider(Spider):
name = "httpbin_org"
def start_requests(self):
yield Request(
"https://httpbin.org/cookies",
meta={
"zyte_api_automap": {
"requestCookies": [
{
"name": "foo",
"value": "bar",
"domain": "httpbin.org",
},
],
},
},
)
def parse(self, response):
print(response.text)
Output:
{
"cookies": {
"foo": "bar"
}
}
Sending text (Unicode) in an HTTP
request
Note
Install and configure code example requirements and the Zyte CA certificate to run the example below.
using System;
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.All
};
HttpClient client = new HttpClient(handler);
var apiKey = "YOUR_API_KEY";
var bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(apiKey + ":");
var auth = System.Convert.ToBase64String(bytes);
client.DefaultRequestHeaders.Add("Authorization", "Basic " + auth);
client.DefaultRequestHeaders.Add("Accept-Encoding", "br, gzip, deflate");
var input = new Dictionary<string, object>(){
{"url", "https://httpbin.org/anything"},
{"httpResponseBody", true},
{"httpRequestMethod", "POST"},
{"httpRequestText", "{\"foo\": \"bar\"}"}
};
var inputJson = JsonSerializer.Serialize(input);
var content = new StringContent(inputJson, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("https://api.zyte.com/v1/extract", content);
var body = await response.Content.ReadAsByteArrayAsync();
var data = JsonDocument.Parse(body);
var base64HttpResponseBody = data.RootElement.GetProperty("httpResponseBody").ToString();
var httpResponseBody = System.Convert.FromBase64String(base64HttpResponseBody);
var responseData = JsonDocument.Parse(httpResponseBody);
var requestBody = responseData.RootElement.GetProperty("data").ToString();
Console.WriteLine(requestBody);
{"url": "https://httpbin.org/anything", "httpResponseBody": true, "httpRequestMethod": "POST", "httpRequestText": "{\"foo\": \"bar\"}"}
zyte-api input.jsonl \
| jq --raw-output .httpResponseBody \
| base64 --decode \
| jq --raw-output .data
{
"url": "https://httpbin.org/anything",
"httpResponseBody": true,
"httpRequestMethod": "POST",
"httpRequestText": "{\"foo\": \"bar\"}"
}
curl \
--user YOUR_API_KEY: \
--header 'Content-Type: application/json' \
--data @input.json \
--compressed \
https://api.zyte.com/v1/extract \
| jq --raw-output .httpResponseBody \
| base64 --decode \
| jq --raw-output .data
import com.google.common.collect.ImmutableMap;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Map;
import org.apache.hc.client5.http.classic.methods.HttpPost;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.ContentType;
import org.apache.hc.core5.http.HttpEntity;
import org.apache.hc.core5.http.HttpHeaders;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.io.entity.EntityUtils;
import org.apache.hc.core5.http.io.entity.StringEntity;
class Example {
private static final String API_KEY = "YOUR_API_KEY";
public static void main(final String[] args)
throws InterruptedException, IOException, ParseException {
Map<String, Object> parameters =
ImmutableMap.of(
"url",
"https://httpbin.org/anything",
"httpResponseBody",
true,
"httpRequestMethod",
"POST",
"httpRequestText",
"{\"foo\": \"bar\"}");
String requestBody = new Gson().toJson(parameters);
HttpPost request = new HttpPost("https://api.zyte.com/v1/extract");
request.setHeader(HttpHeaders.CONTENT_TYPE, ContentType.APPLICATION_JSON);
request.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
request.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
request.setEntity(new StringEntity(requestBody));
CloseableHttpClient client = HttpClients.createDefault();
client.execute(
request,
response -> {
HttpEntity entity = response.getEntity();
String apiResponse = EntityUtils.toString(entity, StandardCharsets.UTF_8);
JsonObject jsonObject = JsonParser.parseString(apiResponse).getAsJsonObject();
String base64HttpResponseBody = jsonObject.get("httpResponseBody").getAsString();
byte[] httpResponseBodyBytes = Base64.getDecoder().decode(base64HttpResponseBody);
String httpResponseBody = new String(httpResponseBodyBytes, StandardCharsets.UTF_8);
JsonObject data = JsonParser.parseString(httpResponseBody).getAsJsonObject();
String body = data.get("data").getAsString();
System.out.println(body);
return null;
});
}
private static String buildAuthHeader() {
String auth = API_KEY + ":";
String encodedAuth = Base64.getEncoder().encodeToString(auth.getBytes());
return "Basic " + encodedAuth;
}
}
const axios = require('axios')
axios.post(
'https://api.zyte.com/v1/extract',
{
url: 'https://httpbin.org/anything',
httpResponseBody: true,
httpRequestMethod: 'POST',
httpRequestText: '{"foo": "bar"}'
},
{
auth: { username: 'YOUR_API_KEY' }
}
).then((response) => {
const httpResponseBody = Buffer.from(
response.data.httpResponseBody,
'base64'
)
const body = JSON.parse(httpResponseBody).data
console.log(body)
})
<?php
$client = new GuzzleHttp\Client();
$response = $client->request('POST', 'https://api.zyte.com/v1/extract', [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => 'https://httpbin.org/anything',
'httpResponseBody' => true,
'httpRequestMethod' => 'POST',
'httpRequestText' => '{"foo": "bar"}',
],
]);
$data = json_decode($response->getBody());
$http_response_body = base64_decode($data->httpResponseBody);
$body = json_decode($http_response_body)->data;
echo $body.PHP_EOL;
With the proxy mode, the request body from your requests is used automatically, be it plain text or binary.
curl \
--proxy api.zyte.com:8011 \
--proxy-user YOUR_API_KEY: \
--compressed \
-X POST \
-H "Content-Type: application/json" \
--data '{"foo": "bar"}' \
https://httpbin.org/anything \
| jq .data
import json
from base64 import b64decode
import requests
api_response = requests.post(
"https://api.zyte.com/v1/extract",
auth=("YOUR_API_KEY", ""),
json={
"url": "https://httpbin.org/anything",
"httpResponseBody": True,
"httpRequestMethod": "POST",
"httpRequestText": '{"foo": "bar"}',
},
)
http_response_body = b64decode(api_response.json()["httpResponseBody"])
body: str = json.loads(http_response_body)["data"]
print(body)
import asyncio
import json
from base64 import b64decode
from zyte_api import AsyncZyteAPI
async def main():
client = AsyncZyteAPI()
api_response = await client.get(
{
"url": "https://httpbin.org/anything",
"httpResponseBody": True,
"httpRequestMethod": "POST",
"httpRequestText": '{"foo": "bar"}',
}
)
http_response_body = b64decode(api_response["httpResponseBody"])
body = json.loads(http_response_body)["data"]
print(body)
asyncio.run(main())
import json
from scrapy import Request, Spider
class HTTPBinOrgSpider(Spider):
name = "httpbin_org"
def start_requests(self):
yield Request(
"https://httpbin.org/anything",
method="POST",
body='{"foo": "bar"}',
)
def parse(self, response):
body = json.loads(response.body)["data"]
print(body)
Output:
{"foo": "bar"}
Getting response headers
Note
Install and configure code example requirements and the Zyte CA certificate to run the example below.
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.All
};
HttpClient client = new HttpClient(handler);
var apiKey = "YOUR_API_KEY";
var bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(apiKey + ":");
var auth = System.Convert.ToBase64String(bytes);
client.DefaultRequestHeaders.Add("Authorization", "Basic " + auth);
client.DefaultRequestHeaders.Add("Accept-Encoding", "br, gzip, deflate");
var input = new Dictionary<string, object>(){
{"url", "https://toscrape.com"},
{"httpResponseHeaders", true}
};
var inputJson = JsonSerializer.Serialize(input);
var content = new StringContent(inputJson, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("https://api.zyte.com/v1/extract", content);
var body = await response.Content.ReadAsByteArrayAsync();
var data = JsonDocument.Parse(body);
var headerEnumerator = data.RootElement.GetProperty("httpResponseHeaders").EnumerateArray();
var headers = new Dictionary<string, string>();
while (headerEnumerator.MoveNext())
{
headers.Add(
headerEnumerator.Current.GetProperty("name").ToString(),
headerEnumerator.Current.GetProperty("value").ToString()
);
}
{"url": "https://toscrape.com", "httpResponseHeaders": true}
zyte-api input.jsonl \
| jq .httpResponseHeaders
{
"url": "https://toscrape.com",
"httpResponseHeaders": true
}
curl \
--user YOUR_API_KEY: \
--header 'Content-Type: application/json' \
--data @input.json \
--compressed \
https://api.zyte.com/v1/extract \
| jq .httpResponseHeaders
import com.google.common.collect.ImmutableMap;
import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
import com.google.gson.JsonArray;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Map;
import org.apache.hc.client5.http.classic.methods.HttpPost;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.ContentType;
import org.apache.hc.core5.http.HttpEntity;
import org.apache.hc.core5.http.HttpHeaders;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.io.entity.EntityUtils;
import org.apache.hc.core5.http.io.entity.StringEntity;
class Example {
private static final String API_KEY = "YOUR_API_KEY";
public static void main(final String[] args)
throws InterruptedException, IOException, ParseException {
Map<String, Object> parameters =
ImmutableMap.of(
"url", "https://toscrape.com", "browserHtml", true, "httpResponseHeaders", true);
String requestBody = new Gson().toJson(parameters);
HttpPost request = new HttpPost("https://api.zyte.com/v1/extract");
request.setHeader(HttpHeaders.CONTENT_TYPE, ContentType.APPLICATION_JSON);
request.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
request.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
request.setEntity(new StringEntity(requestBody));
CloseableHttpClient client = HttpClients.createDefault();
client.execute(
request,
response -> {
HttpEntity entity = response.getEntity();
String apiResponse = EntityUtils.toString(entity, StandardCharsets.UTF_8);
JsonObject jsonObject = JsonParser.parseString(apiResponse).getAsJsonObject();
JsonArray httpResponseHeaders = jsonObject.get("httpResponseHeaders").getAsJsonArray();
Gson gson = new GsonBuilder().setPrettyPrinting().create();
System.out.println(gson.toJson(httpResponseHeaders));
return null;
});
}
private static String buildAuthHeader() {
String auth = API_KEY + ":";
String encodedAuth = Base64.getEncoder().encodeToString(auth.getBytes());
return "Basic " + encodedAuth;
}
}
const axios = require('axios')
axios.post(
'https://api.zyte.com/v1/extract',
{
url: 'https://toscrape.com',
httpResponseHeaders: true
},
{
auth: { username: 'YOUR_API_KEY' }
}
).then((response) => {
const httpResponseHeaders = response.data.httpResponseHeaders
})
<?php
$client = new GuzzleHttp\Client();
$response = $client->request('POST', 'https://api.zyte.com/v1/extract', [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => 'https://toscrape.com',
'httpResponseHeaders' => true,
],
]);
$api = json_decode($response->getBody());
$http_response_headers = $api->httpResponseHeaders;
With the proxy mode, response headers are always included in the HTTP response, no need to ask for them explicitly.
import requests
api_response = requests.post(
"https://api.zyte.com/v1/extract",
auth=("YOUR_API_KEY", ""),
json={
"url": "https://toscrape.com",
"httpResponseHeaders": True,
},
)
http_response_headers = api_response.json()["httpResponseHeaders"]
import asyncio
import json
from zyte_api import AsyncZyteAPI
async def main():
client = AsyncZyteAPI()
api_response = await client.get(
{
"url": "https://toscrape.com",
"httpResponseHeaders": True,
}
)
http_response_headers = api_response["httpResponseHeaders"]
print(json.dumps(http_response_headers, indent=2))
asyncio.run(main())
from scrapy import Request, Spider
class ToScrapeComSpider(Spider):
name = "toscrape_com"
def start_requests(self):
yield Request(
"https://toscrape.com",
meta={
"zyte_api_automap": {
"httpResponseBody": False,
"httpResponseHeaders": True,
},
},
)
def parse(self, response):
headers = response.headers
Note
In transparent mode, httpResponseHeaders is sent by default for httpResponseBody requests, but sending it explicitly is still recommended, as future versions of scrapy-zyte-api may stop sending it by default.
Output (first 5 lines):
[
{
"name": "date",
"value": "Fri, 25 Aug 2023 07:08:05 GMT"
},
Taking a screenshot
Note
Install and configure code example requirements and the Zyte CA certificate to run the example below.
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.All
};
HttpClient client = new HttpClient(handler);
var apiKey = "YOUR_API_KEY";
var bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(apiKey + ":");
var auth = System.Convert.ToBase64String(bytes);
client.DefaultRequestHeaders.Add("Authorization", "Basic " + auth);
client.DefaultRequestHeaders.Add("Accept-Encoding", "br, gzip, deflate");
var input = new Dictionary<string, object>(){
{"url", "https://toscrape.com"},
{"screenshot", true}
};
var inputJson = JsonSerializer.Serialize(input);
var content = new StringContent(inputJson, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("https://api.zyte.com/v1/extract", content);
var body = await response.Content.ReadAsByteArrayAsync();
var data = JsonDocument.Parse(body);
var base64Screenshot = data.RootElement.GetProperty("screenshot").ToString();
var screenshot = System.Convert.FromBase64String(base64Screenshot);
{"url": "https://toscrape.com", "screenshot": true}
zyte-api input.jsonl \
| jq --raw-output .screenshot \
| base64 --decode \
> screenshot.jpg
{
"url": "https://toscrape.com",
"screenshot": true
}
curl \
--user YOUR_API_KEY: \
--header 'Content-Type: application/json' \
--data @input.json \
--compressed \
https://api.zyte.com/v1/extract \
| jq --raw-output .screenshot \
| base64 --decode \
> screenshot.jpg
import com.google.common.collect.ImmutableMap;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Map;
import org.apache.hc.client5.http.classic.methods.HttpPost;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.ContentType;
import org.apache.hc.core5.http.HttpEntity;
import org.apache.hc.core5.http.HttpHeaders;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.io.entity.EntityUtils;
import org.apache.hc.core5.http.io.entity.StringEntity;
class Example {
private static final String API_KEY = "YOUR_API_KEY";
public static void main(final String[] args)
throws InterruptedException, IOException, ParseException {
Map<String, Object> parameters =
ImmutableMap.of("url", "https://toscrape.com", "screenshot", true);
String requestBody = new Gson().toJson(parameters);
HttpPost request = new HttpPost("https://api.zyte.com/v1/extract");
request.setHeader(HttpHeaders.CONTENT_TYPE, ContentType.APPLICATION_JSON);
request.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
request.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
request.setEntity(new StringEntity(requestBody));
CloseableHttpClient client = HttpClients.createDefault();
client.execute(
request,
response -> {
HttpEntity entity = response.getEntity();
String apiResponse = EntityUtils.toString(entity, StandardCharsets.UTF_8);
JsonObject jsonObject = JsonParser.parseString(apiResponse).getAsJsonObject();
String base64Screenshot = jsonObject.get("screenshot").getAsString();
byte[] screenshot = Base64.getDecoder().decode(base64Screenshot);
try (FileOutputStream fos = new FileOutputStream("screenshot.jpg")) {
fos.write(screenshot);
}
return null;
});
}
private static String buildAuthHeader() {
String auth = API_KEY + ":";
String encodedAuth = Base64.getEncoder().encodeToString(auth.getBytes());
return "Basic " + encodedAuth;
}
}
const axios = require('axios')
axios.post(
'https://api.zyte.com/v1/extract',
{
url: 'https://toscrape.com',
screenshot: true
},
{
auth: { username: 'YOUR_API_KEY' }
}
).then((response) => {
const screenshot = Buffer.from(response.data.screenshot, 'base64')
})
<?php
$client = new GuzzleHttp\Client();
$response = $client->request('POST', 'https://api.zyte.com/v1/extract', [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => 'https://toscrape.com',
'screenshot' => true,
],
]);
$api = json_decode($response->getBody());
$screenshot = base64_decode($api->screenshot);
from base64 import b64decode
import requests
api_response = requests.post(
"https://api.zyte.com/v1/extract",
auth=("YOUR_API_KEY", ""),
json={
"url": "https://toscrape.com",
"screenshot": True,
},
)
screenshot: bytes = b64decode(api_response.json()["screenshot"])
import asyncio
from base64 import b64decode
from zyte_api import AsyncZyteAPI
async def main():
client = AsyncZyteAPI()
api_response = await client.get(
{
"url": "https://toscrape.com",
"screenshot": True,
}
)
screenshot = b64decode(api_response["screenshot"])
with open("screenshot.jpg", "wb") as f:
f.write(screenshot)
asyncio.run(main())
from base64 import b64decode
from scrapy import Request, Spider
class ToScrapeComSpider(Spider):
name = "toscrape_com"
def start_requests(self):
yield Request(
"https://toscrape.com",
meta={
"zyte_api_automap": {
"screenshot": True,
},
},
)
def parse(self, response):
screenshot: bytes = b64decode(response.raw_api_response["screenshot"])
Output:
Start a client-managed session with a
browser request and reuse it in an HTTP request
Start a session with a browser request to the home page of a website, and reuse that session for an HTTP request to a different URL of that website.
using System;
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.All
};
HttpClient client = new HttpClient(handler);
var apiKey = "YOUR_API_KEY";
var bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(apiKey + ":");
var auth = System.Convert.ToBase64String(bytes);
client.DefaultRequestHeaders.Add("Authorization", "Basic " + auth);
client.DefaultRequestHeaders.Add("Accept-Encoding", "br, gzip, deflate");
var sessionId = Guid.NewGuid().ToString();
var browserInput = new Dictionary<string, object>(){
{"url", "https://toscrape.com/"},
{"browserHtml", true},
{
"session",
new Dictionary<string, string>()
{
{"id", sessionId}
}
}
};
var browserInputJson = JsonSerializer.Serialize(browserInput);
var browserContent = new StringContent(browserInputJson, Encoding.UTF8, "application/json");
await client.PostAsync("https://api.zyte.com/v1/extract", browserContent);
var httpInput = new Dictionary<string, object>(){
{"url", "https://toscrape.com/"},
{"httpResponseBody", true},
{
"session",
new Dictionary<string, string>()
{
{"id", sessionId}
}
}
};
var httpInputJson = JsonSerializer.Serialize(httpInput);
var httpContent = new StringContent(httpInputJson, Encoding.UTF8, "application/json");
HttpResponseMessage httpResponse = await client.PostAsync("https://api.zyte.com/v1/extract", httpContent);
var httpResponseBody = await httpResponse.Content.ReadAsByteArrayAsync();
var httpData = JsonDocument.Parse(httpResponseBody);
var base64HttpResponseBodyField = httpData.RootElement.GetProperty("httpResponseBody").ToString();
var httpResponseBodyField = System.Convert.FromBase64String(base64HttpResponseBodyField);
var result = System.Text.Encoding.UTF8.GetString(httpResponseBodyField);
Console.WriteLine(result);
import com.google.common.collect.ImmutableMap;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Map;
import java.util.UUID;
import org.apache.hc.client5.http.classic.methods.HttpPost;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.ContentType;
import org.apache.hc.core5.http.HttpEntity;
import org.apache.hc.core5.http.HttpHeaders;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.io.entity.EntityUtils;
import org.apache.hc.core5.http.io.entity.StringEntity;
class Example {
private static final String API_KEY = "YOUR_API_KEY";
public static void main(final String[] args)
throws InterruptedException, IOException, ParseException {
String sessionId = UUID.randomUUID().toString();
Map<String, Object> session = ImmutableMap.of("id", sessionId);
Map<String, Object> browserParameters =
ImmutableMap.of("url", "https://toscrape.com/", "browserHtml", true, "session", session);
String browserRequestBody = new Gson().toJson(browserParameters);
HttpPost browserRequest = new HttpPost("https://api.zyte.com/v1/extract");
browserRequest.setHeader(HttpHeaders.CONTENT_TYPE, ContentType.APPLICATION_JSON);
browserRequest.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
browserRequest.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
browserRequest.setEntity(new StringEntity(browserRequestBody));
CloseableHttpClient client = HttpClients.createDefault();
client.execute(
browserRequest,
browserResponse -> {
Map<String, Object> httpParameters =
ImmutableMap.of(
"url",
"https://books.toscrape.com/",
"httpResponseBody",
true,
"session",
session);
String httpRequestBody = new Gson().toJson(httpParameters);
HttpPost httpRequest = new HttpPost("https://api.zyte.com/v1/extract");
httpRequest.setHeader(HttpHeaders.CONTENT_TYPE, ContentType.APPLICATION_JSON);
httpRequest.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
httpRequest.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
httpRequest.setEntity(new StringEntity(httpRequestBody));
client.execute(
httpRequest,
httpResponse -> {
HttpEntity httpEntity = httpResponse.getEntity();
String httpApiResponse = EntityUtils.toString(httpEntity, StandardCharsets.UTF_8);
JsonObject httpJsonObject =
JsonParser.parseString(httpApiResponse).getAsJsonObject();
String base64HttpResponseBody =
httpJsonObject.get("httpResponseBody").getAsString();
byte[] httpResponseBodyBytes = Base64.getDecoder().decode(base64HttpResponseBody);
String httpResponseBody = new String(httpResponseBodyBytes, StandardCharsets.UTF_8);
System.out.println(httpResponseBody);
return null;
});
return null;
});
}
private static String buildAuthHeader() {
String auth = API_KEY + ":";
String encodedAuth = Base64.getEncoder().encodeToString(auth.getBytes());
return "Basic " + encodedAuth;
}
}
const axios = require('axios')
const crypto = require('crypto')
const sessionId = String(crypto.randomUUID())
axios.post(
'https://api.zyte.com/v1/extract',
{
url: 'https://toscrape.com/',
browserHtml: true,
session: { id: sessionId }
},
{
auth: { username: 'YOUR_API_KEY' }
}
).then((browserResponse) => {
axios.post(
'https://api.zyte.com/v1/extract',
{
url: 'https://books.toscrape.com/',
httpResponseBody: true,
session: { id: sessionId }
},
{
auth: { username: 'YOUR_API_KEY' }
}
).then((httpResponse) => {
const httpResponseBody = Buffer.from(
httpResponse.data.httpResponseBody,
'base64'
)
console.log(httpResponseBody.toString())
})
})
<?php
// https://stackoverflow.com/a/15875555
function uuidv4()
{
$data = random_bytes(16);
$data[6] = chr(ord($data[6]) & 0x0F | 0x40); // set version to 0100
$data[8] = chr(ord($data[8]) & 0x3F | 0x80); // set bits 6-7 to 10
return vsprintf('%s%s-%s-%s-%s-%s%s%s', str_split(bin2hex($data), 4));
}
$client = new GuzzleHttp\Client();
$session_id = uuidv4();
$browser_response = $client->request('POST', 'https://api.zyte.com/v1/extract', [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => 'https://toscrape.com/',
'browserHtml' => true,
'session' => ['id' => $session_id],
],
]);
$http_response = $client->request('POST', 'https://api.zyte.com/v1/extract', [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => 'https://books.toscrape.com/',
'httpResponseBody' => true,
'session' => ['id' => $session_id],
],
]);
$http_data = json_decode($http_response->getBody());
$http_response_body = base64_decode($http_data->httpResponseBody);
echo $http_response_body;
from base64 import b64decode
from uuid import uuid4
import requests
session_id = str(uuid4())
browser_response = requests.post(
"https://api.zyte.com/v1/extract",
auth=("YOUR_API_KEY", ""),
json={
"url": "https://toscrape.com/",
"browserHtml": True,
"session": {"id": session_id},
},
)
http_response = requests.post(
"https://api.zyte.com/v1/extract",
auth=("YOUR_API_KEY", ""),
json={
"url": "https://books.toscrape.com/",
"httpResponseBody": True,
"session": {"id": session_id},
},
)
http_response_body = b64decode(http_response.json()["httpResponseBody"])
print(http_response_body.decode())
import asyncio
from base64 import b64decode
from uuid import uuid4
from zyte_api import AsyncZyteAPI
async def main():
client = AsyncZyteAPI()
session_id = str(uuid4())
browser_response = await client.get(
{
"url": "https://toscrape.com/",
"browserHtml": True,
"session": {"id": session_id},
}
)
http_response = await client.get(
{
"url": "https://books.toscrape.com/",
"httpResponseBody": True,
"session": {"id": session_id},
}
)
http_response_body = b64decode(http_response["httpResponseBody"]).decode()
print(http_response_body)
asyncio.run(main())
from uuid import uuid4
from scrapy import Request, Spider
class ToScrapeComSpider(Spider):
name = "toscrape_com"
def start_requests(self):
session_id = str(uuid4())
yield Request(
"https://toscrape.com/",
callback=self.parse_browser,
cb_kwargs={"session_id": session_id},
meta={
"zyte_api_automap": {
"browserHtml": True,
"session": {"id": session_id},
},
},
)
def parse_browser(self, response, session_id):
yield response.follow(
"https://books.toscrape.com/",
callback=self.parse_http,
meta={
"zyte_api_automap": {
"session": {"id": session_id},
},
},
)
def parse_http(self, response):
print(response.text)
Send HTTP requests with server-managed
sessions started with browser requests
Set a no-op action in sessionContextParameters to force sessions to start with a browser request, but use HTTP requests.
using System;
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
using System.Xml.XPath;
using HtmlAgilityPack;
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.All
};
HttpClient client = new HttpClient(handler);
var apiKey = "YOUR_API_KEY";
var bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(apiKey + ":");
var auth = System.Convert.ToBase64String(bytes);
client.DefaultRequestHeaders.Add("Authorization", "Basic " + auth);
client.DefaultRequestHeaders.Add("Accept-Encoding", "br, gzip, deflate");
var input = new Dictionary<string, object>(){
{"url", "https://toscrape.com/"},
{"httpResponseBody", true},
{
"sessionContext",
new List<Dictionary<string, string>>()
{
new Dictionary<string, string>()
{
{"name", "id"},
{"value", "browser"}
}
}
},
{
"sessionContextParameters",
new Dictionary<string, object>()
{
{
"actions",
new List<Dictionary<string, object>>()
{
new Dictionary<string, object>()
{
{"action", "waitForTimeout"},
{"timeout", 0},
}
}
}
}
}
};
var inputJson = JsonSerializer.Serialize(input);
var content = new StringContent(inputJson, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("https://api.zyte.com/v1/extract", content);
var body = await response.Content.ReadAsByteArrayAsync();
var data = JsonDocument.Parse(body);
var base64HttpResponseBody = data.RootElement.GetProperty("httpResponseBody").ToString();
var httpResponseBodyBytes = System.Convert.FromBase64String(base64HttpResponseBody);
var httpResponseBody = System.Text.Encoding.UTF8.GetString(httpResponseBodyBytes);
Console.WriteLine(httpResponseBody);
{"url": "https://toscrape.com/", "httpResponseBody": true, "sessionContext": [{"name": "id", "value": "browser"}], "sessionContextParameters": {"actions": [{"action": "waitForTimeout", "timeout": 0}]}}
zyte-api input.jsonl \
| jq --raw-output .httpResponseBody \
| base64 --decode
{
"url": "https://toscrape.com/",
"httpResponseBody": true,
"sessionContext": [
{
"name": "id",
"value": "browser"
}
],
"sessionContextParameters": {
"actions": [
{
"action": "waitForTimeout",
"timeout": 0
}
]
}
}
curl \
--user YOUR_API_KEY: \
--header 'Content-Type: application/json' \
--data @input.json \
--compressed \
https://api.zyte.com/v1/extract \
| jq --raw-output .httpResponseBody \
| base64 --decode
import com.google.common.collect.ImmutableList;
import com.google.common.collect.ImmutableMap;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Map;
import org.apache.hc.client5.http.classic.methods.HttpPost;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.ContentType;
import org.apache.hc.core5.http.HttpEntity;
import org.apache.hc.core5.http.HttpHeaders;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.io.entity.EntityUtils;
import org.apache.hc.core5.http.io.entity.StringEntity;
class Example {
private static final String API_KEY = "YOUR_API_KEY";
public static void main(final String[] args)
throws InterruptedException, IOException, ParseException {
Map<String, Object> parameters =
ImmutableMap.of(
"url",
"https://toscrape.com/",
"httpResponseBody",
true,
"sessionContext",
ImmutableList.of(ImmutableMap.of("name", "id", "value", "browser")),
"sessionContextParameters",
ImmutableMap.of(
"actions",
ImmutableList.of(ImmutableMap.of("action", "waitForTimeout", "timeout", 0))));
String requestBody = new Gson().toJson(parameters);
HttpPost request = new HttpPost("https://api.zyte.com/v1/extract");
request.setHeader(HttpHeaders.CONTENT_TYPE, ContentType.APPLICATION_JSON);
request.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
request.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
request.setEntity(new StringEntity(requestBody));
CloseableHttpClient client = HttpClients.createDefault();
client.execute(
request,
response -> {
HttpEntity entity = response.getEntity();
String apiResponse = EntityUtils.toString(entity, StandardCharsets.UTF_8);
JsonObject jsonObject = JsonParser.parseString(apiResponse).getAsJsonObject();
String base64HttpResponseBody = jsonObject.get("httpResponseBody").getAsString();
byte[] httpResponseBodyBytes = Base64.getDecoder().decode(base64HttpResponseBody);
String httpResponseBody = new String(httpResponseBodyBytes, StandardCharsets.UTF_8);
System.out.println(httpResponseBody);
return null;
});
}
private static String buildAuthHeader() {
String auth = API_KEY + ":";
String encodedAuth = Base64.getEncoder().encodeToString(auth.getBytes());
return "Basic " + encodedAuth;
}
}
const axios = require('axios')
axios.post(
'https://api.zyte.com/v1/extract',
{
url: 'https://toscrape.com/',
httpResponseBody: true,
sessionContext: [
{
name: 'id',
value: 'browser'
}
],
sessionContextParameters: {
actions: [
{
action: 'waitForTimeout',
timeout: 0
}
]
}
},
{
auth: { username: 'YOUR_API_KEY' }
}
).then((response) => {
const httpResponseBody = Buffer.from(
response.data.httpResponseBody,
'base64'
)
console.log(httpResponseBody.toString())
})
<?php
$client = new GuzzleHttp\Client();
$response = $client->request('POST', 'https://api.zyte.com/v1/extract', [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => 'https://toscrape.com/',
'httpResponseBody' => true,
'sessionContext' => [
[
'name' => 'id',
'value' => 'browser',
],
],
'sessionContextParameters' => [
'actions' => [
[
'action' => 'waitForTimeout',
'timeout' => 0,
],
],
],
],
]);
$data = json_decode($response->getBody());
$http_response_body = base64_decode($data->httpResponseBody);
echo $http_response_body.PHP_EOL;
from base64 import b64decode
import requests
api_response = requests.post(
"https://api.zyte.com/v1/extract",
auth=("YOUR_API_KEY", ""),
json={
"url": "https://toscrape.com/",
"httpResponseBody": True,
"sessionContext": [{"name": "id", "value": "browser"}],
"sessionContextParameters": {
"actions": [
{
"action": "waitForTimeout",
"timeout": 0,
},
],
},
},
)
http_response_body_bytes = b64decode(api_response.json()["httpResponseBody"])
http_response_body = http_response_body_bytes.decode()
print(http_response_body)
import asyncio
from base64 import b64decode
from zyte_api import AsyncZyteAPI
async def main():
client = AsyncZyteAPI()
http_response = await client.get(
{
"url": "https://toscrape.com/",
"httpResponseBody": True,
"sessionContext": [{"name": "id", "value": "browser"}],
"sessionContextParameters": {
"actions": [
{
"action": "waitForTimeout",
"timeout": 0,
},
],
},
}
)
http_response_body = b64decode(http_response["httpResponseBody"]).decode()
print(http_response_body)
asyncio.run(main())
from scrapy import Request, Spider
class HTTPBinOrgSpider(Spider):
name = "httpbin_org"
def start_requests(self):
yield Request(
"https://toscrape.com/",
meta={
"zyte_api_automap": {
"sessionContext": [
{
"name": "id",
"value": "browser",
},
],
"sessionContextParameters": {
"actions": [
{
"action": "waitForTimeout",
"timeout": 0,
},
],
},
},
},
)
def parse(self, response):
print(response.text)
Send HTTP requests with server-managed
sessions started with a browser action that visits a specific URL
Note
Install and configure code example requirements and the Zyte CA certificate to run the example below.
using System;
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
using System.Xml.XPath;
using HtmlAgilityPack;
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.All
};
HttpClient client = new HttpClient(handler);
var apiKey = "YOUR_API_KEY";
var bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(apiKey + ":");
var auth = System.Convert.ToBase64String(bytes);
client.DefaultRequestHeaders.Add("Authorization", "Basic " + auth);
client.DefaultRequestHeaders.Add("Accept-Encoding", "br, gzip, deflate");
var input = new Dictionary<string, object>(){
{"url", "http://httpbin.org/cookies"},
{"httpResponseBody", true},
{
"sessionContext",
new List<Dictionary<string, string>>()
{
new Dictionary<string, string>()
{
{"name", "id"},
{"value", "cookies"}
}
}
},
{
"sessionContextParameters",
new Dictionary<string, object>()
{
{
"actions",
new List<Dictionary<string, object>>()
{
new Dictionary<string, object>()
{
{"action", "goto"},
{"url", "http://httpbin.org/cookies/set/foo/bar"},
}
}
}
}
}
};
var inputJson = JsonSerializer.Serialize(input);
var content = new StringContent(inputJson, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("https://api.zyte.com/v1/extract", content);
var body = await response.Content.ReadAsByteArrayAsync();
var data = JsonDocument.Parse(body);
var base64HttpResponseBody = data.RootElement.GetProperty("httpResponseBody").ToString();
var httpResponseBodyBytes = System.Convert.FromBase64String(base64HttpResponseBody);
var httpResponseBody = System.Text.Encoding.UTF8.GetString(httpResponseBodyBytes);
Console.WriteLine(httpResponseBody);
{"url": "http://httpbin.org/cookies", "httpResponseBody": true, "sessionContext": [{"name": "id", "value": "cookies"}], "sessionContextParameters": {"actions": [{"action": "goto", "url": "http://httpbin.org/cookies/set/foo/bar"}]}}
zyte-api input.jsonl \
| jq --raw-output .httpResponseBody \
| base64 --decode
{
"url": "http://httpbin.org/cookies",
"httpResponseBody": true,
"sessionContext": [
{
"name": "id",
"value": "cookies"
}
],
"sessionContextParameters": {
"actions": [
{
"action": "goto",
"url": "http://httpbin.org/cookies/set/foo/bar"
}
]
}
}
curl \
--user YOUR_API_KEY: \
--header 'Content-Type: application/json' \
--data @input.json \
--compressed \
https://api.zyte.com/v1/extract \
| jq --raw-output .httpResponseBody \
| base64 --decode
import com.google.common.collect.ImmutableList;
import com.google.common.collect.ImmutableMap;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Map;
import org.apache.hc.client5.http.classic.methods.HttpPost;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.ContentType;
import org.apache.hc.core5.http.HttpEntity;
import org.apache.hc.core5.http.HttpHeaders;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.io.entity.EntityUtils;
import org.apache.hc.core5.http.io.entity.StringEntity;
class Example {
private static final String API_KEY = "YOUR_API_KEY";
public static void main(final String[] args)
throws InterruptedException, IOException, ParseException {
Map<String, Object> parameters =
ImmutableMap.of(
"url",
"http://httpbin.org/cookies",
"httpResponseBody",
true,
"sessionContext",
ImmutableList.of(ImmutableMap.of("name", "id", "value", "cookies")),
"sessionContextParameters",
ImmutableMap.of(
"actions",
ImmutableList.of(
ImmutableMap.of(
"action", "goto", "url", "http://httpbin.org/cookies/set/foo/bar"))));
String requestBody = new Gson().toJson(parameters);
HttpPost request = new HttpPost("https://api.zyte.com/v1/extract");
request.setHeader(HttpHeaders.CONTENT_TYPE, ContentType.APPLICATION_JSON);
request.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
request.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
request.setEntity(new StringEntity(requestBody));
CloseableHttpClient client = HttpClients.createDefault();
client.execute(
request,
response -> {
HttpEntity entity = response.getEntity();
String apiResponse = EntityUtils.toString(entity, StandardCharsets.UTF_8);
JsonObject jsonObject = JsonParser.parseString(apiResponse).getAsJsonObject();
String base64HttpResponseBody = jsonObject.get("httpResponseBody").getAsString();
byte[] httpResponseBodyBytes = Base64.getDecoder().decode(base64HttpResponseBody);
String httpResponseBody = new String(httpResponseBodyBytes, StandardCharsets.UTF_8);
System.out.println(httpResponseBody);
return null;
});
}
private static String buildAuthHeader() {
String auth = API_KEY + ":";
String encodedAuth = Base64.getEncoder().encodeToString(auth.getBytes());
return "Basic " + encodedAuth;
}
}
const axios = require('axios')
axios.post(
'https://api.zyte.com/v1/extract',
{
url: 'http://httpbin.org/cookies',
httpResponseBody: true,
sessionContext: [
{
name: 'id',
value: 'cookies'
}
],
sessionContextParameters: {
actions: [
{
action: 'goto',
url: 'http://httpbin.org/cookies/set/foo/bar'
}
]
}
},
{
auth: { username: 'YOUR_API_KEY' }
}
).then((response) => {
const httpResponseBody = Buffer.from(
response.data.httpResponseBody,
'base64'
)
console.log(httpResponseBody.toString())
})
<?php
$client = new GuzzleHttp\Client();
$response = $client->request('POST', 'https://api.zyte.com/v1/extract', [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => 'http://httpbin.org/cookies',
'httpResponseBody' => true,
'sessionContext' => [
[
'name' => 'id',
'value' => 'cookies',
],
],
'sessionContextParameters' => [
'actions' => [
[
'action' => 'goto',
'url' => 'http://httpbin.org/cookies/set/foo/bar',
],
],
],
],
]);
$data = json_decode($response->getBody());
$http_response_body = base64_decode($data->httpResponseBody);
echo $http_response_body.PHP_EOL;
from base64 import b64decode
import requests
api_response = requests.post(
"https://api.zyte.com/v1/extract",
auth=("YOUR_API_KEY", ""),
json={
"url": "http://httpbin.org/cookies",
"httpResponseBody": True,
"sessionContext": [
{
"name": "id",
"value": "cookies",
},
],
"sessionContextParameters": {
"actions": [
{
"action": "goto",
"url": "http://httpbin.org/cookies/set/foo/bar",
},
],
},
},
)
http_response_body_bytes = b64decode(api_response.json()["httpResponseBody"])
http_response_body = http_response_body_bytes.decode()
print(http_response_body)
import asyncio
from base64 import b64decode
from zyte_api import AsyncZyteAPI
async def main():
client = AsyncZyteAPI()
api_response = await client.get(
{
"url": "http://httpbin.org/cookies",
"httpResponseBody": True,
"sessionContext": [
{
"name": "id",
"value": "cookies",
},
],
"sessionContextParameters": {
"actions": [
{
"action": "goto",
"url": "http://httpbin.org/cookies/set/foo/bar",
},
],
},
},
)
http_response_body_bytes = b64decode(api_response["httpResponseBody"])
http_response_body = http_response_body_bytes.decode()
print(http_response_body)
asyncio.run(main())
Tip
scrapy-zyte-api also provides its own session management API, similar to that of server-managed sessions, but built on top of client-managed sessions.
from scrapy import Request, Spider
class HTTPBinOrgSpider(Spider):
name = "httpbin_org"
def start_requests(self):
yield Request(
"http://httpbin.org/cookies",
meta={
"zyte_api_automap": {
"sessionContext": [
{
"name": "id",
"value": "cookies",
},
],
"sessionContextParameters": {
"actions": [
{
"action": "goto",
"url": "http://httpbin.org/cookies/set/foo/bar",
},
],
},
},
},
)
def parse(self, response):
print(response.text)
Output:
{
"cookies": {
"foo": "bar"
}
}
Send 2 consecutive requests through the same IP address using a
client-managed session
Note
Install and configure code example requirements and the Zyte CA certificate to run the example below.
using System;
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.All
};
HttpClient client = new HttpClient(handler);
var apiKey = "YOUR_API_KEY";
var bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(apiKey + ":");
var auth = System.Convert.ToBase64String(bytes);
client.DefaultRequestHeaders.Add("Authorization", "Basic " + auth);
client.DefaultRequestHeaders.Add("Accept-Encoding", "br, gzip, deflate");
var sessionId = Guid.NewGuid().ToString();
for (int i = 0; i < 2; i++)
{
var input = new Dictionary<string, object>(){
{"url", "https://httpbin.org/ip"},
{"httpResponseBody", true},
{
"session",
new Dictionary<string, string>()
{
{"id", sessionId}
}
}
};
var inputJson = JsonSerializer.Serialize(input);
var content = new StringContent(inputJson, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("https://api.zyte.com/v1/extract", content);
var body = await response.Content.ReadAsByteArrayAsync();
var data = JsonDocument.Parse(body);
var base64HttpResponseBody = data.RootElement.GetProperty("httpResponseBody").ToString();
var httpResponseBodyBytes = System.Convert.FromBase64String(base64HttpResponseBody);
var httpResponseBody = System.Text.Encoding.UTF8.GetString(httpResponseBodyBytes);
var responseData = JsonDocument.Parse(httpResponseBody);
var ipAddress = responseData.RootElement.GetProperty("origin").ToString();
Console.WriteLine(ipAddress);
}
{"url": "https://httpbin.org/ip", "httpResponseBody": true, "session": {"id": "e07843b4-fd72-4a02-82b4-3376c6ceba92"}}
{"url": "https://httpbin.org/ip", "httpResponseBody": true, "session": {"id": "e07843b4-fd72-4a02-82b4-3376c6ceba92"}}
zyte-api input.jsonl \
| jq --raw-output .httpResponseBody \
| base64 --decode \
| jq --raw-output .origin
{
"url": "https://httpbin.org/ip",
"httpResponseBody": true,
"session": {
"id": "e07843b4-fd72-4a02-82b4-3376c6ceba92"
}
}
for i in {1..2}
do
curl \
--user YOUR_API_KEY: \
--header 'Content-Type: application/json' \
--data @input.json \
--compressed \
https://api.zyte.com/v1/extract \
| jq --raw-output .httpResponseBody \
| base64 --decode \
| jq --raw-output .origin
done
import com.google.common.collect.ImmutableMap;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Map;
import java.util.UUID;
import org.apache.hc.client5.http.classic.methods.HttpPost;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.ContentType;
import org.apache.hc.core5.http.HttpEntity;
import org.apache.hc.core5.http.HttpHeaders;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.io.entity.EntityUtils;
import org.apache.hc.core5.http.io.entity.StringEntity;
class Example {
private static final String API_KEY = "YOUR_API_KEY";
public static void main(final String[] args)
throws InterruptedException, IOException, ParseException {
String sessionId = UUID.randomUUID().toString();
CloseableHttpClient client = HttpClients.createDefault();
for (int i = 0; i < 2; i++) {
Map<String, Object> session = ImmutableMap.of("id", sessionId);
Map<String, Object> parameters =
ImmutableMap.of(
"url", "https://httpbin.org/ip", "httpResponseBody", true, "session", session);
String requestBody = new Gson().toJson(parameters);
HttpPost request = new HttpPost("https://api.zyte.com/v1/extract");
request.setHeader(HttpHeaders.CONTENT_TYPE, ContentType.APPLICATION_JSON);
request.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
request.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
request.setEntity(new StringEntity(requestBody));
client.execute(
request,
response -> {
HttpEntity entity = response.getEntity();
String apiResponse = EntityUtils.toString(entity, StandardCharsets.UTF_8);
JsonObject jsonObject = JsonParser.parseString(apiResponse).getAsJsonObject();
String base64HttpResponseBody = jsonObject.get("httpResponseBody").getAsString();
byte[] httpResponseBodyBytes = Base64.getDecoder().decode(base64HttpResponseBody);
String httpResponseBody = new String(httpResponseBodyBytes, StandardCharsets.UTF_8);
JsonObject data = JsonParser.parseString(httpResponseBody).getAsJsonObject();
String body = data.get("origin").getAsString();
System.out.println(body);
return null;
});
}
}
private static String buildAuthHeader() {
String auth = API_KEY + ":";
String encodedAuth = Base64.getEncoder().encodeToString(auth.getBytes());
return "Basic " + encodedAuth;
}
}
const axios = require('axios')
const crypto = require('crypto')
const sessionId = String(crypto.randomUUID())
axios.post(
'https://api.zyte.com/v1/extract',
{
url: 'https://httpbin.org/ip',
httpResponseBody: true,
session: { id: sessionId }
},
{
auth: { username: 'YOUR_API_KEY' }
}
).then((response) => {
const httpResponseBody = Buffer.from(
response.data.httpResponseBody,
'base64'
)
const body = JSON.parse(httpResponseBody).origin
console.log(body)
axios.post(
'https://api.zyte.com/v1/extract',
{
url: 'https://httpbin.org/ip',
httpResponseBody: true,
session: { id: sessionId }
},
{
auth: { username: 'YOUR_API_KEY' }
}
).then((response) => {
const httpResponseBody = Buffer.from(
response.data.httpResponseBody,
'base64'
)
const body = JSON.parse(httpResponseBody).origin
console.log(body)
})
})
<?php
// https://stackoverflow.com/a/15875555
function uuidv4()
{
$data = random_bytes(16);
$data[6] = chr(ord($data[6]) & 0x0F | 0x40); // set version to 0100
$data[8] = chr(ord($data[8]) & 0x3F | 0x80); // set bits 6-7 to 10
return vsprintf('%s%s-%s-%s-%s-%s%s%s', str_split(bin2hex($data), 4));
}
$client = new GuzzleHttp\Client();
$session_id = uuidv4();
for ($i = 0; $i < 2; ++$i) {
$response = $client->request('POST', 'https://api.zyte.com/v1/extract', [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => 'https://httpbin.org/anything',
'httpResponseBody' => true,
'session' => ['id' => $session_id],
],
]);
$data = json_decode($response->getBody());
$http_response_body = base64_decode($data->httpResponseBody);
$body = json_decode($http_response_body)->origin;
echo $body.PHP_EOL;
}
With the proxy mode, use the
Zyte-Session-ID
header.
for i in {1..2}
do
curl \
--proxy api.zyte.com:8011 \
--proxy-user YOUR_API_KEY: \
--header 'Content-Type: application/json' \
--header 'Zyte-Session-ID: e07843b4-fd72-4a02-82b4-3376c6ceba92' \
--compressed \
https://httpbin.org/ip \
| jq --raw-output .origin
done
import json
from base64 import b64decode
from uuid import uuid4
import requests
session_id = str(uuid4())
for _ in range(2):
api_response = requests.post(
"https://api.zyte.com/v1/extract",
auth=("YOUR_API_KEY", ""),
json={
"url": "https://httpbin.org/ip",
"httpResponseBody": True,
"session": {"id": session_id},
},
)
http_response_body = b64decode(api_response.json()["httpResponseBody"])
body: str = json.loads(http_response_body)["origin"]
print(body)
import asyncio
import json
from base64 import b64decode
from uuid import uuid4
from zyte_api import AsyncZyteAPI
async def main():
client = AsyncZyteAPI()
session_id = str(uuid4())
for i in range(2):
api_response = await client.get(
{
"url": "https://httpbin.org/ip",
"httpResponseBody": True,
"session": {"id": session_id},
},
)
http_response_body = b64decode(api_response["httpResponseBody"]).decode()
data = json.loads(http_response_body)
print(data["origin"])
asyncio.run(main())
Tip
scrapy-zyte-api also provides its own session management API, similar to that of server-managed sessions, but built on top of client-managed sessions.
import json
from uuid import uuid4
from scrapy import Request, Spider
class HTTPBinOrgSpider(Spider):
name = "httpbin_org"
def start_requests(self):
session_id = str(uuid4())
yield Request(
"https://httpbin.org/ip",
cb_kwargs={"session_id": session_id},
meta={"zyte_api_automap": {"session": {"id": session_id}}},
)
def parse(self, response, session_id):
print(json.loads(response.body)["origin"])
yield Request(
"https://httpbin.org/ip",
meta={"zyte_api_automap": {"session": {"id": session_id}}},
dont_filter=True,
callback=self.parse2,
)
def parse2(self, response):
print(json.loads(response.body)["origin"])
Output:
203.0.113.122
203.0.113.122
Access a shadow DOM
Note
Install and configure code example requirements and the Zyte CA certificate to run the example below.
To get content from the shadow DOM, use the evaluate
action to create an
invisible DOM element, which you will get in browserHtml, and
fill it with the desired content from the shadow DOM.
Tip
If your evaluate
action does not work as expected, check the
actions response field for errors.
The following example code shows how to access the shadow DOM paragraph from
a shadow DOM example in CodePen using the
evaluate
action with the following source
:
const div = document.createElement('div')
div.setAttribute('id', 'shadow-root-content')
// Hide, in case you also want to take a screenshot.
div.style.display = 'none'
const iframe = document.getElementById('result')
div.innerText = iframe
.contentWindow.document
.getElementById('shadow-root')
.shadowRoot.querySelector('p').textContent
document.body.appendChild(div)
using System;
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Text;
using System.Text.Json;
using System.Threading.Tasks;
using System.Xml.XPath;
using HtmlAgilityPack;
HttpClientHandler handler = new HttpClientHandler()
{
AutomaticDecompression = DecompressionMethods.All
};
HttpClient client = new HttpClient(handler);
var apiKey = "YOUR_API_KEY";
var bytes = Encoding.GetEncoding("ISO-8859-1").GetBytes(apiKey + ":");
var auth = System.Convert.ToBase64String(bytes);
client.DefaultRequestHeaders.Add("Authorization", "Basic " + auth);
client.DefaultRequestHeaders.Add("Accept-Encoding", "br, gzip, deflate");
var input = new Dictionary<string, object>(){
{"url", "https://cdpn.io/TLadd/fullpage/PoGoQeV?anon=true&view="},
{"browserHtml", true},
{
"actions",
new List<Dictionary<string, object>>()
{
new Dictionary<string, object>()
{
{"action", "evaluate"},
{"source", @"
const div = document.createElement('div')
div.setAttribute('id', 'shadow-root-content')
div.style.display = 'none'
const iframe = document.getElementById('result')
div.innerText = iframe
.contentWindow.document
.getElementById('shadow-root')
.shadowRoot.querySelector('p').textContent
document.body.appendChild(div)
"}
}
}
}
};
var inputJson = JsonSerializer.Serialize(input);
var content = new StringContent(inputJson, Encoding.UTF8, "application/json");
HttpResponseMessage response = await client.PostAsync("https://api.zyte.com/v1/extract", content);
var body = await response.Content.ReadAsByteArrayAsync();
var data = JsonDocument.Parse(body);
var browserHtml = data.RootElement.GetProperty("browserHtml").ToString();
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(browserHtml);
var navigator = htmlDocument.CreateNavigator();
var nodeIterator = (XPathNodeIterator)navigator.Evaluate("//*[@id=\"shadow-root-content\"]/text()");
nodeIterator.MoveNext();
var shadowText = nodeIterator.Current.ToString();
Console.WriteLine(shadowText);
import com.google.common.collect.ImmutableMap;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Collections;
import java.util.Map;
import org.apache.hc.client5.http.classic.methods.HttpPost;
import org.apache.hc.client5.http.impl.classic.CloseableHttpClient;
import org.apache.hc.client5.http.impl.classic.HttpClients;
import org.apache.hc.core5.http.ContentType;
import org.apache.hc.core5.http.HttpEntity;
import org.apache.hc.core5.http.HttpHeaders;
import org.apache.hc.core5.http.ParseException;
import org.apache.hc.core5.http.io.entity.EntityUtils;
import org.apache.hc.core5.http.io.entity.StringEntity;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
class Example {
private static final String API_KEY = "YOUR_API_KEY";
public static void main(final String[] args)
throws InterruptedException, IOException, ParseException {
Map<String, Object> actions =
ImmutableMap.of(
"action",
"evaluate",
"source",
"const div = document.createElement('div')\n"
+ "div.setAttribute('id', 'shadow-root-content')\n"
+ "div.style.display = 'none'\n"
+ "const iframe = document.getElementById('result')\n"
+ "div.innerText = iframe\n"
+ " .contentWindow.document\n"
+ " .getElementById('shadow-root')\n"
+ " .shadowRoot.querySelector('p').textContent\n"
+ "document.body.appendChild(div)");
Map<String, Object> parameters =
ImmutableMap.of(
"url",
"https://cdpn.io/TLadd/fullpage/PoGoQeV?anon=true&view=",
"browserHtml",
true,
"actions",
Collections.singletonList(actions));
String requestBody = new Gson().toJson(parameters);
HttpPost request = new HttpPost("https://api.zyte.com/v1/extract");
request.setHeader(HttpHeaders.CONTENT_TYPE, ContentType.APPLICATION_JSON);
request.setHeader(HttpHeaders.ACCEPT_ENCODING, "gzip, deflate");
request.setHeader(HttpHeaders.AUTHORIZATION, buildAuthHeader());
request.setEntity(new StringEntity(requestBody));
CloseableHttpClient client = HttpClients.createDefault();
client.execute(
request,
response -> {
HttpEntity entity = response.getEntity();
String apiResponse = EntityUtils.toString(entity, StandardCharsets.UTF_8);
JsonObject jsonObject = JsonParser.parseString(apiResponse).getAsJsonObject();
String browserHtml = jsonObject.get("browserHtml").getAsString();
Document document = Jsoup.parse(browserHtml);
String shadowText = document.select("#shadow-root-content").text();
System.out.println(shadowText);
return null;
});
}
private static String buildAuthHeader() {
String auth = API_KEY + ":";
String encodedAuth = Base64.getEncoder().encodeToString(auth.getBytes());
return "Basic " + encodedAuth;
}
}
const axios = require('axios')
const cheerio = require('cheerio')
axios.post(
'https://api.zyte.com/v1/extract',
{
url: 'https://cdpn.io/TLadd/fullpage/PoGoQeV?anon=true&view=',
browserHtml: true,
actions: [
{
action: 'evaluate',
source: `
const div = document.createElement('div')
div.setAttribute('id', 'shadow-root-content')
div.style.display = 'none'
const iframe = document.getElementById('result')
div.innerText = iframe
.contentWindow.document
.getElementById('shadow-root')
.shadowRoot.querySelector('p').textContent
document.body.appendChild(div)
`
}
]
},
{
auth: { username: 'YOUR_API_KEY' }
}
).then((response) => {
const browserHtml = response.data.browserHtml
const $ = cheerio.load(browserHtml)
const shadowText = $('#shadow-root-content').text()
console.log(shadowText)
})
<?php
$client = new GuzzleHttp\Client();
$response = $client->request('POST', 'https://api.zyte.com/v1/extract', [
'auth' => ['YOUR_API_KEY', ''],
'headers' => ['Accept-Encoding' => 'gzip'],
'json' => [
'url' => 'https://cdpn.io/TLadd/fullpage/PoGoQeV?anon=true&view=',
'browserHtml' => true,
'actions' => [
[
'action' => 'evaluate',
'source' => "
const div = document.createElement('div')
div.setAttribute('id', 'shadow-root-content')
div.style.display = 'none'
const iframe = document.getElementById('result')
div.innerText = iframe
.contentWindow.document
.getElementById('shadow-root')
.shadowRoot.querySelector('p').textContent
document.body.appendChild(div)
",
],
],
],
]);
$data = json_decode($response->getBody());
$doc = new DOMDocument();
$doc->loadHTML($data->browserHtml);
$xpath = new DOMXPath($doc);
$shadow_text = $xpath->query("//*[@id='shadow-root-content']")->item(0)->textContent;
echo $shadow_text.PHP_EOL;
import requests
from parsel import Selector
api_response = requests.post(
"https://api.zyte.com/v1/extract",
auth=("YOUR_API_KEY", ""),
json={
"url": "https://cdpn.io/TLadd/fullpage/PoGoQeV?anon=true&view=",
"browserHtml": True,
"actions": [
{
"action": "evaluate",
"source": """
const div = document.createElement('div')
div.setAttribute('id', 'shadow-root-content')
div.style.display = 'none'
const iframe = document.getElementById('result')
div.innerText = iframe
.contentWindow.document
.getElementById('shadow-root')
.shadowRoot.querySelector('p').textContent
document.body.appendChild(div)
""",
},
],
},
)
browser_html = api_response.json()["browserHtml"]
shadow_text = Selector(browser_html).css("#shadow-root-content::text").get()
print(shadow_text)
import asyncio
from parsel import Selector
from zyte_api import AsyncZyteAPI
async def main():
client = AsyncZyteAPI()
api_response = await client.get(
{
"url": "https://cdpn.io/TLadd/fullpage/PoGoQeV?anon=true&view=",
"browserHtml": True,
"actions": [
{
"action": "evaluate",
"source": """
const div = document.createElement('div')
div.setAttribute('id', 'shadow-root-content')
div.style.display = 'none'
const iframe = document.getElementById('result')
div.innerText = iframe
.contentWindow.document
.getElementById('shadow-root')
.shadowRoot.querySelector('p').textContent
document.body.appendChild(div)
""",
},
],
},
)
browser_html = api_response["browserHtml"]
shadow_text = Selector(browser_html).css("#shadow-root-content::text").get()
print(shadow_text)
asyncio.run(main())
from scrapy import Request, Spider
class CodePenSpider(Spider):
name = "codepen"
def start_requests(self):
yield Request(
"https://cdpn.io/TLadd/fullpage/PoGoQeV?anon=true&view=",
meta={
"zyte_api_automap": {
"browserHtml": True,
"actions": [
{
"action": "evaluate",
"source": """
const div = document.createElement('div')
div.setAttribute('id', 'shadow-root-content')
div.style.display = 'none'
const iframe = document.getElementById('result')
div.innerText = iframe
.contentWindow.document
.getElementById('shadow-root')
.shadowRoot.querySelector('p').textContent
document.body.appendChild(div)
""",
},
],
},
},
)
def parse(self, response):
shadow_text = response.css("#shadow-root-content::text").get()
print(shadow_text)
Output:
Shadow Paragraph