M
M
Mikkkch2020-11-21 19:16:20
Parsing
Mikkkch, 2020-11-21 19:16:20

Should asynchrony be used in parsing?

Hello, does it make sense to use an asynchronous programming approach when we are faced with the task of accepting an unlimited number of links and monitoring them until the desired element appears?
Perhaps the question will seem inappropriate to someone, but still I ask you to be loyal.

Answer the question

In order to leave comments, you need to log in

3 answer(s)
R
Roman Kitaev, 2020-11-21
@Mikkkch

Parsing is almost the first thing on the list of what it is worth using asynchrony for. The more IO operations, the more advantages the asynchronous approach has over the thread pool. If during parsing it is necessary to perform CPU-bound operations (parsing XML / HTML, etc.), this is taken out to the thread pool (for python, it is a process pool) and through asynchronous bindings (for example, in python it is run_in_executor) it is farmed there, while the main thread is not blocked.
An example on the knee that parses the graph of links from Wikipedia, with a process pool, lxml and other goodies: tyk

X
xmoonlight, 2020-11-21
@xmoonlight

Yes. But there is a nuance.
If these are links of one domain, you need to control the maximum number of parallel connections to one domain and the frequency of requests .

V
Vladimir Korotenko, 2020-11-21
@firedragon

I would use a regular threadpool. All the same, but more control over memory and resources

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question