Answer the question
In order to leave comments, you need to log in
What technologies to use for constant monitoring of pages?
I am writing a small service for my needs. The main functionality will consist of two stages:
1. Getting the position of the site through Yandex.XML;
2. Constant monitoring of the site pages (more than a million pages), so that it contains all the necessary elements.
Studied the issue, decided to stop at Gearman. There are several questions:
1. Is it the right choice? Or are there better alternatives?
2. Can I receive data from Gearman, such as the position of the site on request?
Answer the question
In order to leave comments, you need to log in
1) https://www.rabbitmq.com/
2) well, without Yandex, it’s unlikely that anyone will report this, but with Yandex, just all the complexity
The 1st service periodically scans the results of site positions. When each individual site result or a certain number of them (say, 10) is received, a task is sent to the "SitePositions" queue.
The 2nd service worker in the required number of instances receives scanning tasks from "SitePositions". At the same time, the beginning of the site scanning operation is recorded in the database, and at the end of the scan, it is marked that the task has been completed.
They called me as a tag expert.
Yes, everything has already been suggested correctly: queues, different workers for different types of content.
I would take RabbitMQ because it's mainstream.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question