Start a Scrapy project#

To build your web scraping project, you will use Scrapy, a popular open source web scraping framework written in Python and maintained by Zyte.

Setup your project#

  1. Install Python, version 3.7 or better.


    You can run python --version on a terminal window to make sure that you have a good-enough version of Python.

  2. Open a terminal window.

  3. Create a web-scraping-tutorial folder and make it your working folder:

    mkdir web-scraping-tutorial
    cd web-scraping-tutorial
  4. Create and activate a Python virtual environment.

    • On Windows:

      python3 -m venv tutorial-env
    • On macOS and Linux:

      python3 -m venv tutorial-env
      . tutorial-env/bin/activate
  5. Install the latest version of Scrapy:

    pip install --upgrade scrapy
  6. Make web-scraping-tutorial a Scrapy project folder:

    scrapy startproject tutorial .

    Your web-scraping-tutorial folder should now contain the following folders and files:

    ├── scrapy.cfg
    └── tutorial/
        └── spiders/

Create your first spider#

Now that you are all set up, you will write code to extract data from all books in the Mystery category of

Create a file at tutorial/spiders/ with the following code:

from scrapy import Spider

class BooksToScrapeComSpider(Spider):
    name = "books_toscrape_com"
    start_urls = [

    def parse(self, response):
        next_page_links = response.css(".next a")
        yield from response.follow_all(next_page_links)
        book_links = response.css("article a")
        yield from response.follow_all(book_links, callback=self.parse_book)

    def parse_book(self, response):
        yield {
            "name": response.css("h1::text").get(),
            "price": response.css(".price_color::text").re_first("£(.*)"),
            "url": response.url,

In the code above:

  • You define a Scrapy spider class named books_toscrape_com.

  • Your spider starts by sending a request for the Mystery category URL,, (start_urls), and parses the response with the default callback method: parse.

  • The parse callback method:

    • Finds the link to the next page and, if found, yields a request for it, whose response will also be parsed by the parse callback method.

      As a result, the parse callback method eventually parses all pages of the Mystery category.

    • Finds links to book detail pages, and yields requests for them, whose responses will be parsed by the parse_book callback method.

      As a result, the parse_book callback method eventually parses all book detail pages from the Mystery category.

  • The parse_book callback method extracts a record of book information with the book name, price, and URL.

Now run your code:

scrapy crawl books_toscrape_com -O books.csv

Once execution finishes, the generated books.csv file will contain records for all books from the Mystery category of in CSV format. You can open books.csv with any spreadsheet app.