Answer the question
In order to leave comments, you need to log in
How to properly run the scraper on Scrapy?
Good afternoon.
The task is to parse sites, similar to avito.ru There
are more than 20 sites, each site has its own spider
The data is parsed and written to the database via Pipeline (one for all)
Question: how to run the parser so that it works all the time?
I launched 20 spiders, and that each one, regardless of the others, would be restarted after the end.
Tried through CrawlerProcess and CrawlerRunner, reactor, but I didn't succeed. The spider was launched 1 time, and on the second launch, an error occurred that the proccess (or reactor) could not be restarted.
So far I have solved the problem with the following bash script:
#!/bin/bash
cd '/path/to/spider/folder'
while [ True ]
do
scrapy crawl my_spider_1
scrapy crawl my_spider_2
scrapy crawl my_spider_3
sleep 15
done
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question