0
0
0xigen2015-07-24 10:37:49
Parsing
0xigen, 2015-07-24 10:37:49

What architecture to use for a news aggregator?

After some thought, the following picture emerges in my head: The bot in an endless loop accesses sites (rss/saitmap/parsing the news page) and receives a list of news links, adds them to the database (redis/modgoDB). The second bot, also in a loop, follows the links and parses the news, after which it sends them to the site api for further processing and adding to the main database.
There are a few questions left: How can bots/streams be synchronized in order to avoid duplication of news, how to set the scanning interval of a news resource depending on the time of day, which architecture is more suitable for these purposes?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
�
âš¡ Kotobotov âš¡, 2015-07-24
@angrySCV

for flow management, as well as task scheduling, it is convenient to use, for example , akka

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question