Answer the question
In order to leave comments, you need to log in
How to delay parsing in node.js?
Good day!
essence of the question: I have, say, 12,000 pages of one site, with normal parsing with the help of request
and cheerio
after a few pages the site crashes with an error. When parsing, how can I delay and sequentially parse the content of 12,000 pages from a file?
Thanks in advance!
Answer the question
In order to leave comments, you need to log in
What do you mean the site crashes ? In sense ceases to return the adequate answer?
Answering the question: at one time, in order to bypass all kinds of protection mechanisms against parsing, I broke the parsing procedure into many many parts: parsing one specific element, bunch (parsing a group of elements), session (bunch consisting of bunches).
In the code, a bunch differed from a session only in the number of processed elements. That is, the parsing algorithm turned out to be something like this:
Обрабатываем элемент №i
Если ошибка, то
ждём SINGLE_REQUEST_TIMEOUT
пробуем еще раз
i++
Если остаток от деления i на ITEMS_IN_BUNCH равен нулю, то
ждём BUNCH_TIMEOUT
Иначе Если остаток от деления i на ITEMS_IN_SESSION равен нулю, то
ждём SESSION_TIMEOUT
Иначе
ждём SINGLE_REQUEST_TIMEOUT
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question