D
D
DamskiyUgodnik2019-07-20 14:14:29
Parsing
DamskiyUgodnik, 2019-07-20 14:14:29

How to write a fast and easy crawler?

Hello!
There is a need to parse a large amount of data. A parser was written in python (multiprocessing, request), everything works as it should, but I ran into a problem, each thread eats up a lot of processor and RAM (I didn’t even expect such a load).
The logic of work is the following:

  1. We take url (for example, line-by-line reading from a file)
  2. We pump out
  3. Write response to file

The logic seems to be primitive and, in theory, nothing should be loaded, but already at 20-30 threads the server barely moves (home PC, average percentage, 16gb RAM, ubuntu server, except for the parser, nothing starts, I don’t rest against the disk according to iotop).
Actually interested in:
What industrial parsers usually write in order to be fast, multi-threaded and adequate in terms of resources (I suppose my problem is using the python, multiprocessing, request bundle). So far it's seen as something like C++, or am I just not good at cooking python? :)

Answer the question

In order to leave comments, you need to log in

2 answer(s)
D
Dimonchik, 2019-07-20
@dimonchik2013

Scrapy

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question