R
R
rumasterov2016-08-31 10:34:35
Java
rumasterov, 2016-08-31 10:34:35

What architecture should I choose to run parsers on schedule in parallel mode?

I'm thinking about parsing information from different sites. Each site has its own parser where the parsing logic is described. Different parsers need to be run at different times.
How to implement the launch of these parsers according to the schedule in parallel mode?
So far I have the following ideas:
Using Spring TaskScheduler, create a scheduler for a group of parsers that need to run every 10 minutes, another scheduler for a group of parsers that need to run every minute, etc.
Inside the scheduler, create tasks and add them to the Redis or RabbitMQ queue, the worker will pick it up, run the necessary parser based on the arguments passed, and parse it.
But here the question arises, what will happen if the scheduler creates a new task for Parsing Site #1, while the previous one is still in progress? I would not want the queue to line up, i.e. if Site Parsing #1 is still running, then the task for Site Parsing #1 does not need to be added. How can this be resolved? So far, it comes to mind only in the database to keep a table with tasks and status, and before adding it, check whether there is already a similar task in the process. But I think there are smarter solutions.
Maybe someone faced similar problems? I don’t need to paint in detail, at least give a hint where to dig and in the right direction, I think?
Thanks in advance.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
A
Alexey Cheremisin, 2016-08-31
@leahch

Have a look at https://zookeeper.apache.org , it seems to be just right for your coordination task.

S
sirs, 2016-08-31
@sirs

Do you really need a scheduler and a queue? Why not just create a set of parsers, each of which will be run by cron.
As options, see
1) quartz , here and here is a good example
2) ScheduledExecutorService
3) TimerTask

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question