V
V
Vasily G.2015-09-19 01:23:09
Python
Vasily G., 2015-09-19 01:23:09

How to properly run the scraper on Scrapy?

Good afternoon.
The task is to parse sites, similar to avito.ru There
are more than 20 sites, each site has its own spider
The data is parsed and written to the database via Pipeline (one for all)
Question: how to run the parser so that it works all the time?
I launched 20 spiders, and that each one, regardless of the others, would be restarted after the end.
Tried through CrawlerProcess and CrawlerRunner, reactor, but I didn't succeed. The spider was launched 1 time, and on the second launch, an error occurred that the proccess (or reactor) could not be restarted.
So far I have solved the problem with the following bash script:

#!/bin/bash

cd '/path/to/spider/folder'

while [ True ]
do
scrapy crawl my_spider_1
scrapy crawl my_spider_2
scrapy crawl my_spider_3
sleep 15
done

Added a job to cron that would run when the server is rebooted.
Everything works, but I suspect that there is a better solution.
UPD:
Can someone plant another parsing tool that will solve this problem?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
J
JRazor, 2015-09-20
@JRazor

Unfortunately, so far there has been no need to solve such a problem. But for running periodic tasks, I can recommend celery as an alternative to Cron .

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question