V
V
Vladislav Pilyugin2018-06-21 16:50:36
Parsing
Vladislav Pilyugin, 2018-06-21 16:50:36

How and where can one learn to write multi-threaded parsers for any task?

Very interesting topic: Parsing.
Only on the Internet there is no normal information on it.
Where to look for info?

Answer the question

In order to leave comments, you need to log in

5 answer(s)
S
Saboteur, 2018-06-21
@saboteur_kiev

Only on the Internet there is no normal information on it.

Are you joking? You just don't know how to parse the Internet. Information about parsing on the Internet is simply heaps.
Read what is CVS, XML, HTML, tree house.
You can read about ready-made libraries for parsing XML/HTML
You can read about regular expressions
And then write simple parsers in any language.

D
Dmitry Entelis, 2018-06-22
@DmitriyEntelis

Structurally, any multi-threaded parser is simple:
You have 2 queues: tasks for the jump and tasks for the actual parsing.
Accordingly, there are two types of workers.
The first type of worker takes tasks from the download queue, pumps them with some kind of curl and puts them in the queue for processing.
The second type of worker - somehow parses the content and shoves it into the final storage.
Parsing is best done with regular expressions - this works the fastest.

L
latteo, 2018-06-22
@latteo

At least once a year, a video course on parsing is released in pure PL or with the help of datacol, content downloader and other tools. Some of them end up on torrents and file sharing services. Some are initially public and available on YouTube.

S
spaceatmoon, 2018-06-22
@spaceatmoon

I can say that doing mapping from empty to empty is easy, but making an application that solves life's problems is another.
Therefore, only through experience. I've written 2 parsers. Both of them are up to something. The first goods in any volume can be parsed directly to the site. My second last one can parse for the next frame of the future site. Now, soon I will write a megacombine capable of parsing any content with the hands of a housewife. For most even pulled up to set up the same thing. This is relevant for social networks, all sorts of photo galleries. But here with SPA, js and all this nonsense, some solution is needed.
It is desirable that the calculator worked. I think it's better not to write parsers in js, just shoot them in the legs. Very cool and crutch language - I love and hate it.
C#, python, PHP, GO I think it will be fine.
I'm writing in PHP and I'm fine, if I'm not satisfied with the speed, then I just figure out a new 24MB stream and that's it.
It all depends on how you know how to build architecture. Despite my experience, I still write feeling that I am writing heresy and therefore it is possible to rewrite it. Remember, refactoring is evil. You can rewrite Fibonacci a thousand times and still it will be garbage.

B
bro-dev0, 2018-06-24
@bro-dev0

I do not advise you to concentrate on this, an area with a very low ceiling. You won’t be able to earn a lot, develop skills too, dofiga competitors, and tasks are divided into very simple monotonous ones, and those for which you have to write your own skynet.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question