J
J
JRazor2014-02-15 14:57:39
MySQL
JRazor, 2014-02-15 14:57:39

Why does Scrapy only add 40 rows to the database?

Hello. There was a problem of the following nature: Scrapy adds only 40 rows to the database, after which it stops. I don't understand - whether it's in the settings, or in the script itself (but there are no counters in it). Help to find the reason for this stop.
The spider parses the original page, looks for directory urls in it, passes it to directory parsing and looks for the urls of the necessary pages. On that necessary page, he looks for information that is written to the dictionary, and then to the database. The script only runs for 40 lines and then stops. Stops by itself. Tested on different computers, on different bases - 40 lines.
Scrapy spider settings:

options = {
    'CONCURRENT_ITEMS':  250,
    'USER_AGENT':  'Googlebot/2.1 (+http://www.google.com/bot.html)',
    'CONCURRENT_REQUESTS':  1
    'DOWNLOAD_DELAY':  0.5
}

Code itself: pastebin.com/tN1AKUmx
Logs:
2014-02-14 20:14:12+0600 [auto] INFO: Crawled 53 pages (at 53 pages/min), scraped 0 items (at 0 items/min)
2014-02-14 20:15:12+0600 [auto] INFO: Crawled 71 pages (at 18 pages/min), scraped 0 items (at 0 items/min)
2014-02-14 20:15:21+0600 [auto] INFO: Closing spider (finished)
2014-02-14 20:15:21+0600 [auto] INFO: Dumping Scrapy stats:
    {'downloader/request_bytes': 57078,
     'downloader/request_count': 75,
     'downloader/request_method_count/GET': 75,
     'downloader/response_bytes': 2058372,
     'downloader/response_count': 75,
     'downloader/response_status_count/200': 75,
     'finish_reason': 'finished',
     'finish_time': datetime.datetime(2014, 2, 14, 14, 15, 21, 192000),
     'request_depth_max': 2,
     'response_received_count': 75,
     'scheduler/dequeued': 75,
     'scheduler/dequeued/memory': 75,
     'scheduler/enqueued': 75,
     'scheduler/enqueued/memory': 75,
     'start_time': datetime.datetime(2014, 2, 14, 14, 13, 12, 8000)}
2014-02-14 20:15:21+0600 [auto] INFO: Spider closed (finished)

Answer the question

In order to leave comments, you need to log in

1 answer(s)
A
a_well, 2014-02-17
@a_well

Try to set it on some other section of the site and see the result, also look at what this site writes in cookies, there was a problem with cookie overflow, since the site on Bitrix wrote the navigation history in them and did not clear it. Everything fell.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question