Answer the question
In order to leave comments, you need to log in
How to download large pages with Grab:Spyder?
Good day, friends.
I download a lot of pages from the Internet and add the contents to the database. Smaller pages load without problems, but larger pages log the following:
DEBUG:grab.stat:RPS: 0.26 [error:operation-timeouted= 7 ]
The italicized numbers change from page to page.
I rummaged through all Grab's documentation, all Issues on github, all posts in GoogleGroups. I even tweaked the source. I learned a lot of new things, but not the meaning of the mysterious "timeouted".
Dear friends, please explain what this error from the log means (I assume that "the connection is broken because the timeout interval has been exceeded") and how to make the spiders download pages of any size.
UPD:
Forgot to mention, I'm using a proxy server.
Without it, I just get a 429 error 75% of the time.
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question