A
A
Alexander Vtyurin2015-11-23 12:11:11
Python
Alexander Vtyurin, 2015-11-23 12:11:11

How to download large pages with Grab:Spyder?

Good day, friends.
I download a lot of pages from the Internet and add the contents to the database. Smaller pages load without problems, but larger pages log the following:
DEBUG:grab.stat:RPS: 0.26 [error:operation-timeouted= 7 ]
The italicized numbers change from page to page.
I rummaged through all Grab's documentation, all Issues on github, all posts in GoogleGroups. I even tweaked the source. I learned a lot of new things, but not the meaning of the mysterious "timeouted".
Dear friends, please explain what this error from the log means (I assume that "the connection is broken because the timeout interval has been exceeded") and how to make the spiders download pages of any size.
UPD:
Forgot to mention, I'm using a proxy server.
Without it, I just get a 429 error 75% of the time.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
D
Dimonchik, 2015-11-23
@dimonchik2013

grab uses pycurl, you can try setting it with options

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question