Is there a better option than doing random?

B

beduin012020-10-18 16:19:49

PostgreSQL

beduin01, 2020-10-18 16:19:49

There is an application that makes a selection of (paths) files from the database and performs their processing.
I would like to run several copies at once, but I can not think of a better way to prevent two copies from making the same fetch twice.
The files themselves in the database are always several thousand (as soon as there are less than a thousand of them, new ones are loaded there).

Now everything works according to the principle:

... ORDER BY random() LIMIT 100;

but are there better options? I don't want to block flags of the isProcessing type. on a sample of several thousand, random() LIMIT 100 will give more or less unique files for each copy.

Any better ideas?

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

I

Ivan Shumov, 2020-10-18
@inoise

We take and read the database into memory with nanks, send it to the queue (rabbitmq, for example), and on the other side of the queue we hang the handler (s) in the required number of copies. It is only important to understand that exact-one-delivery does not exist. Must be controlled

D

Dr. Bacon, 2020-10-18
@bacon

If you have not yet grown to queues, then the question is: how long does processing take? If not for a long time, you can simply lock the record for this time via SELECT FOR UPDATE with SKIP LOCKED

X

xmoonlight, 2020-10-18
@xmoonlight

By iterating records (ROW_NUMBER()) by record number in batches.
We took, for example, a package of 10 records: from 1 to 10, wrote somewhere the start number of the next record, and so on.