Answer the question
In order to leave comments, you need to log in
How and where can one learn to write multi-threaded parsers for any task?
Very interesting topic: Parsing.
Only on the Internet there is no normal information on it.
Where to look for info?
Answer the question
In order to leave comments, you need to log in
Only on the Internet there is no normal information on it.
Structurally, any multi-threaded parser is simple:
You have 2 queues: tasks for the jump and tasks for the actual parsing.
Accordingly, there are two types of workers.
The first type of worker takes tasks from the download queue, pumps them with some kind of curl and puts them in the queue for processing.
The second type of worker - somehow parses the content and shoves it into the final storage.
Parsing is best done with regular expressions - this works the fastest.
At least once a year, a video course on parsing is released in pure PL or with the help of datacol, content downloader and other tools. Some of them end up on torrents and file sharing services. Some are initially public and available on YouTube.
I can say that doing mapping from empty to empty is easy, but making an application that solves life's problems is another.
Therefore, only through experience. I've written 2 parsers. Both of them are up to something. The first goods in any volume can be parsed directly to the site. My second last one can parse for the next frame of the future site. Now, soon I will write a megacombine capable of parsing any content with the hands of a housewife. For most even pulled up to set up the same thing. This is relevant for social networks, all sorts of photo galleries. But here with SPA, js and all this nonsense, some solution is needed.
It is desirable that the calculator worked. I think it's better not to write parsers in js, just shoot them in the legs. Very cool and crutch language - I love and hate it.
C#, python, PHP, GO I think it will be fine.
I'm writing in PHP and I'm fine, if I'm not satisfied with the speed, then I just figure out a new 24MB stream and that's it.
It all depends on how you know how to build architecture. Despite my experience, I still write feeling that I am writing heresy and therefore it is possible to rewrite it. Remember, refactoring is evil. You can rewrite Fibonacci a thousand times and still it will be garbage.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question