V
V
vosyukov2017-12-29 11:35:06
Python
vosyukov, 2017-12-29 11:35:06

Is the architecture of the system for parsing done correctly?

The python script walks the site and collects links, then it sends them to Rabbitmq. Other python scripts access the queue and parse all the necessary data and put it in the mongo.
Actually the question is, is it possible to do something better in this system, or can you use some other tools?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
R
Roman Mirilaczvili, 2017-12-29
@vosyukov

Normal solution. There are no ideal solutions.

I
InoMono, 2018-02-06
@InoMono

You need to be aware that now many sites come to life only if JavaScript is enabled.
That is, not everything can be parsed on bare Python.
To fully read information from many sites, you need something like SimplerJS, PhantomJS, Selenium

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question