V
V
Victor2015-08-26 12:15:14
PostgreSQL
Victor, 2015-08-26 12:15:14

How to speed up query in postgresql?

There is a table with five fields of type int. There are several million records in it (in the future there will be tens of millions or hundreds)
One of the columns (word_id) is indexed by a b-tree index, it is searched for the form:

select * from keypoints where word_id in ( около 500-1000 значений)

Such a request on my leisurely laptop takes 200-300s, iotop shows that the disk is being read. As a result, tens of thousands of records can be returned. The repeated request is much faster.
In the postgresql config, I only changed authorization settings.
Question: such a request is incorrect in principle and cannot be executed quickly, or can something be optimized?
Update:
Explain output for one where:
explain select * from keypoints where word_id=27601;

Bitmap Heap Scan on keypoints (cost=28.37..4118.23 rows=1080 width=24)
Recheck Cond: (word_id = 27601)
-> Bitmap Index Scan on keypoints_search_idx (cost=0.00..28.10 rows=1080 width=0)
Index Cond : (word_id = 27601)
(4 rows)
Time: 696.716 ms

Answer the question

In order to leave comments, you need to log in

3 answer(s)
M
Melkij, 2015-08-26
@melkij

In order:
So you're head over heels in the disk.
There is only one option - to install a fast disk.
The option is worse, because the disk will still remain slow, and if not on reading, then at the start and on writing, you will run into it very well - use more memory. So that as much data as possible is in memory. But during operation, this data still needs to be retrieved from slow disks, so the database will first have to be warmed up. There is even a regular pg_prewarm utility
By default, there is something unfunny instead of shared_buffers. 32 or something megabytes? And this is a key characteristic of the base.
Increase at least up to 20% of the total memory on the machine.
Thirdly, the current Postgresql is pretty bad at spreading IN. And there is a not entirely obvious solution to this - rewrite in join:
This hash join is faster than IN. On 10,000 elements, 10ms was obtained versus 380ms for IN.

M
Max, 2015-08-26
@MaxDukov

I think it's time to open up the EXPLAIN command and see the execution plan to see if indexes are being used at all. And look at both the main request and the nested one.
Judging by

takes 200-300s, iotop shows that the disk is being read.
- no, there is a FULL SCAN.
editing the config will help if you want to play around with cache sizes. While the problem seems to be in the indexes.

D
Dmitry Kovalsky, 2015-08-26
@dmitryKovalskiy

And how the list "about 500-1000 values" is defined? I would try to get rid of the nested query. Through Join or some WHERE.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question