Answer the question
In order to leave comments, you need to log in
How can you speed up a low-selectivity fetch from a table with hundreds of millions of records?
I have a table with hundreds of millions of records. About 200 GB. It has many columns, including text. There are columns with low selectivity. For example, the city column. The task is to select everything where the city is equal to a certain value. For example, in St. Petersburg there are about 10 million records. All of them need to be output to a file. Those. a query like this COPY (SELECT multiple fields...) TO 'file.txt'. Now they are stored for half an hour. No indexes help. Moreover, if you do not SELECT several fields, but SELECT id ... WHERE city = ..., then this happens in a few seconds. If you take out the records for the city of St. Petersburg in a separate materialized view, SELECT several fields ... it takes not half an hour, but half a minute.
Answer the question
In order to leave comments, you need to log in
Cast the base to DNF3.
For the current case: all cities should be in a separate table - a list of cities with IDs.
What indexes are there? Table structure?
explain(analyze, buffers)?
1. not needed
2. see 1
3. if you hit the CPU, not the disk. If in a disk - will make only worse.
4. first find out how the existing plate behaves. Then think. For example, brin by city id. On low-selective fields, a distinct compact index will be obtained.
5. 200GB is quite a normal base. It's not even astronomically expensive to place it entirely in shared_buffers.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question