Scrapy Cloud scripts#
In addition to Scrapy spiders, you can include standalone Python scripts in your Scrapy project and run them on Scrapy Cloud.
Scrapy Cloud scripts need to be declared under scripts
in your setup.py
file:
from setuptools import setup, find_packages
setup(
name="myproject",
version="1.0",
packages=find_packages(),
scripts=["bin/hello.py"],
entry_points={"scrapy": ["settings = myproject.settings"]},
)
When starting a job, you can select a script instead of a spider.
Scripts are listed with their file name, prefixed with py:
; for example,
py:hello.py
for the script in the example above.
To access your Scrapy project settings from a script, including those defined
in Scrapy Cloud, use the sh_scrapy.utils.get_project_settings
function:
from sh_scrapy.utils import get_project_settings
settings = get_project_settings()
Note
This function was introduced in scrapinghub-entrypoint-scrapy 0.12.
If you cannot import it, make sure you are using a modern Scrapy
stack, or add scrapinghub-entrypoint-scrapy>=0.12
to your
requirements.