How to download large pages with Grab:Spyder?

A

Alexander Vtyurin2015-11-23 12:11:11

Python

Alexander Vtyurin, 2015-11-23 12:11:11

Good day, friends.
I download a lot of pages from the Internet and add the contents to the database. Smaller pages load without problems, but larger pages log the following:
DEBUG:grab.stat:RPS: 0.26 [error:operation-timeouted= 7 ]
The italicized numbers change from page to page.
I rummaged through all Grab's documentation, all Issues on github, all posts in GoogleGroups. I even tweaked the source. I learned a lot of new things, but not the meaning of the mysterious "timeouted".
Dear friends, please explain what this error from the log means (I assume that "the connection is broken because the timeout interval has been exceeded") and how to make the spiders download pages of any size.
UPD:
Forgot to mention, I'm using a proxy server.
Without it, I just get a 429 error 75% of the time.

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

D

Dimonchik, 2015-11-23
@dimonchik2013

grab uses pycurl, you can try setting it with options