Answer the question
In order to leave comments, you need to log in
Is there a better option than doing random?
There is an application that makes a selection of (paths) files from the database and performs their processing.
I would like to run several copies at once, but I can not think of a better way to prevent two copies from making the same fetch twice.
The files themselves in the database are always several thousand (as soon as there are less than a thousand of them, new ones are loaded there).
Now everything works according to the principle:
... ORDER BY random() LIMIT 100;
but are there better options? I don't want to block flags of the isProcessing type. on a sample of several thousand, random() LIMIT 100 will give more or less unique files for each copy.
Any better ideas?
Answer the question
In order to leave comments, you need to log in
We take and read the database into memory with nanks, send it to the queue (rabbitmq, for example), and on the other side of the queue we hang the handler (s) in the required number of copies. It is only important to understand that exact-one-delivery does not exist. Must be controlled
If you have not yet grown to queues, then the question is: how long does processing take? If not for a long time, you can simply lock the record for this time via SELECT FOR UPDATE with SKIP LOCKED
By iterating records (ROW_NUMBER()) by record number in batches.
We took, for example, a package of 10 records: from 1 to 10, wrote somewhere the start number of the next record, and so on.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question