How to understand what is to blame for a slow search on PostgreSQL?

F

fedor_nefedov2015-01-14 16:52:32

PostgreSQL

fedor_nefedov, 2015-01-14 16:52:32

There is a server in the configuration: Intel Xeon 2.30GHz, 8 GB RAM, no SSD, CentOS 7, PostgreSQL 9.4. There are 10M records in the database with a maximum 1 KB text field. There is a field of type tsvector, compiled from a text field. GIST index on the tsvector field and Btree index on the ID field. The database takes about 20 GB. Postgresql.conf via pgtune.
The following situation develops:
1. A SELECT count(*) query over the entire table takes about 5 minutes. The second same request is about 20 seconds.
2. Query like SELECT * FROM table WHERE tsvector_field @@ to_tsquery('text'); it is fulfilled minutes 7 and it is natural after repetition of request milliseconds are fulfilled. EXPLAIN shows that all required indexes are in use.
The questions are:
1. How can I make sure that after restarting the server all these requests are executed as quickly as after the first request? (Something like load it into memory or 'rake it';)
2. Why does the query on indexed fields take so long?
3. Are the server settings to blame?
4. How much RAM do you need? And is the hard drive to blame?
5. With what number of records in the database should you start scaling?
6. What is the optimal database size for one server if the database is scaled?

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

F

fedor_nefedov, 2015-01-15
@fedor_nefedov

So, after a short test, the following came up: the bottleneck is the hard drive, after it comes the RAM and then the settings in PostgreSQL.conf, after improving both characteristics, I finished the config and the query with a result of 400,000 lines was built in 10 seconds, and the query for the limit was 100 0.40 ms. General conclusion: Resources are to blame

B

Boris Benkovsky, 2015-01-14
@benbor

The obvious solution is the famous "crutch": after satrt, execute these requests, and the cache will warm up.
But it's better to find a thin place, htop, iotop, scout_realtime to help you - make that very slow request - see what sags from the hardware.
Wangyu that the hard drive will sag

P

Pavel Nazarov, 2015-01-15
@smbd

In general, Postgree likes to cache indexes into RAM, and therefore the more it is given away, the better.
And see the difference between requests with normal EXPLAIN, with buffers - for sure, the first request did IO, and the second one used cached buffers.
General conclusion: ignorance of the materiel is to blame, to be honest. I don’t see anything surprising in the results of such requests - roughly speaking, we have the same.
1. You can warm up the cache with your hands. In 9.4, it seems they made a feature to restart while saving caches.
// offtopic: in our production for a year and a half, postgres seems to have been restarted once - but why do you need it? //
2. Because it is being read from the hard disk, and the indexes are large. See above.
3. Guilty only in terms of repeated requests.
4. See above, the more the better. On the first request - to blame.
5-6. Not quite right. It all depends on the nature of the load and the slowest requests, and not on how much and what where.