Scrapy Cloud scripts#

In addition to Scrapy spiders, you can include standalone Python scripts in your Scrapy project and run them on Scrapy Cloud.

Scrapy Cloud scripts need to be declared under scripts in your setup.py file:

from setuptools import setup, find_packages

setup(
    name="myproject",
    version="1.0",
    packages=find_packages(),
    scripts=["bin/hello.py"],
    entry_points={"scrapy": ["settings = myproject.settings"]},
)

When starting a job, you can select a script instead of a spider. Scripts are listed with their file name, prefixed with py:; for example, py:hello.py for the script in the example above.

To access your Scrapy project settings from a script, including those defined in Scrapy Cloud, use the sh_scrapy.utils.get_project_settings function:

from sh_scrapy.utils import get_project_settings

settings = get_project_settings()

Note

This function was introduced in scrapinghub-entrypoint-scrapy 0.12. If you cannot import it, make sure you are using a modern Scrapy stack, or add scrapinghub-entrypoint-scrapy>=0.12 to your requirements.