H
H
hoxnox2012-07-27 07:08:49
PostgreSQL
hoxnox, 2012-07-27 07:08:49

How to organize line-by-line reading from a very large, immutable table?

There is a very large immutable table (will not fit into memory). It is necessary to organize line-by-line reading of this table from different hosts from C\C++ code (although the code is not important).
For example, on the server we have

id
===
1
4
6
87

Host 10.0.0.1 makes a request, receives 1. Host 10.0.0.3 makes two requests, receives 4, 6. Etc.

PS: At the moment, a daemon has been implemented that hangs on the server side, makes requests like “SELECT id FROM table ORDER BY (id) OFFSET K LIMIT N” to fill the internal buffer, from which it distributes answers to requesting hosts. From a certain K, this whole balalaika starts to slow down.

It is impossible that such a simple task does not have a simple solution with regular Postgres tools ...

UPD: There was an idea in the daemon to execute the command "COPY table (id) to STDOUT" and "suck" data from it into the buffer. But there is a fear that postgres will abort the execution of this command at the "most interesting place" for some reason.

Answer the question

In order to leave comments, you need to log in

7 answer(s)
T
ToSHiC, 2012-07-27
@hoxnox

You can open a cursor in your daemon for the SELECT id FROM table ORDER BY (id) query and read from there in batches and distribute to clients. The API has all the necessary functions.
OFFSET slows down, because each time you have to make a request, sort, skip a bunch of rows, etc. Cursors are ideal for the task described, given the presence of a daemon.

A
Alexander Korotkov, 2012-07-27
@smagen

You can also do this:

SELECT id FROM table WHERE id > последний_прочитанный_в_прошлый_раз_id ORDER BY(id) LIMIT N

Then, when scanning by index, you won’t have to skip a bunch of records each time, but you can start right away from the right place.

S
strib, 2012-07-27
@strib

Cursor won't help?

E
EndUser, 2012-07-27
@EndUser

By analogy with the oracle, I recommend tinkering with triggers instead of the view www.depesz.com/2010/10/16/waiting-for-9-1-triggers-on-views/

H
hoxnox, 2012-07-27
@hoxnox

asm0dey , you can try ... Only without an additional column, otherwise adding values ​​\u200b\u200bto it is not a quick matter. Just delete from ahead of n. True, clients will have to somehow inform that they need to be patient a little while the table is being updated. And then they are impatient and in the absence of an answer they fall out with an error.

H
hoxnox, 2012-07-27
@hoxnox

EndUser , so the idea is to periodically generate a view of part of the table, as a buffer for the daemon? Is that how it works? And at what point to form representations in advance or along the way? If in advance, then there can be a lot of them, they must somehow be created automatically, named, explained to the demon how, at what moment, from which representation to read.
If on the go, then the time to generate the view can be as long as the request with an offset.
And in general whether representations will work quickly? This is also far from a fact ...
In short, ideas need to be explored. Too many questions…

B
bugman, 2012-08-10
@bugman

Allow me to look at the task from a different angle - maybe not write your queue bicycles , but use the ready- made ones ?
And in the general case, a certain structure is needed to store the current state, either internal (for example, a marker trivially shifted along the original plate), or external (additional plate). Each has its pros and cons. But as soon as aspects such as delivery / processing control, resending, current state monitoring come into play, look again at the first paragraph.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question