Using Smart Proxy Manager with Splash

Warning

zyte-smartproxy-ca.crt should be installed in your OS for the below code to work. You can follow these instructions in order to install it.

Note

All the code in this documentation has been tested with Splash 3.5, Python 3.9.5

Installation

  1. Setup the Zyte SmartProxy (formerly Crawlera) Headless Proxy as described in Using Headless Browsers with Zyte Smart Proxy Manager.

  2. Download and install Splash following this guide as explained here: https://splash.readthedocs.io/en/stable/install.html

Assuming you installed Splash using Docker, proceed to run Splash with:

docker run -it -p 8050:8050 --rm scrapinghub/splash

You can confirm Splash is running by accessing the Splash web UI at http://localhost:8050

Using Web UI

Here is a sample script you can use to test the integration of Splash with Smart Proxy Manager, once you have also installed the Headless Proxy.

Just paste this code into Splash web UI main page, enter a URL (ex. http://example.com) and hit the “Render me!” button.

function main(splash)
    splash:on_request(function (request)
        request:set_proxy{"host.docker.internal", 3128}
    end)

    splash:go(splash.args.url)
    return splash:png()
end

Using Python & Request library

In order to use Python and Request library with Splash we need to first take the lua code mentioned above and save it in a file name say spm-splash.lua. Now, save the Python code mentioned below to another file in the same directory say sample.py. Make sure you have Requests library installed before moving ahead.

import requests

splash_server = 'http://0.0.0.0:8050'
url = "https://example.com"

with open('spm-splash.lua') as lua:
    lua_source = ''.join(lua.readlines())
    splash_url = '{}/execute'.format(splash_server)
    r = requests.post(
        splash_url,
        json={
            'lua_source': lua_source,
            'url': url,
        },
        timeout=100,
    )

    fp = open("spm-splash.png", "wb")
    fp.write(r.content)
    fp.close()

Now run the file using:

$ python sample.py

You’ll find a screenshot of entered URL in the same directory as your Lua and Python files.

Using Splash with Scrapy

In order to use Zyte Smart Proxy Manager with Splash and Scrapy check out Using Smart Proxy Manager with Splash and Scrapy.