V
V
VMCLOUD2014-08-03 10:16:10
PostgreSQL
VMCLOUD, 2014-08-03 10:16:10

Postgresql is not processing queries. Why?

Actually, colleagues, the situation is as follows.
There is PGSQL 9.2, on the server 2 * Xeon 2620 CPU \ 128 Gb Ram \ 8 * 600Gb SAS (Raid 10). Channel 1Gbps.
The base size is about 200 Gb with indexes. Base config pastebin.com/RN6CKXPP There are pgbouncers on the frontends, frontends process up to 1k requests per minute.
Once a day, the database starts to hang tightly (not to execute requests), or to be stupid on very simple requests (for example, a simple select used to run for 1 second, now 30-90). In practice, this is the first time, so I don’t know where to dig.
Please help! I can provide more details if needed.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
V
vyegorov, 2014-08-08
@vyegorov

Not enough information, but I'll try.
We have:

  • 128GB in total;
  • 50GB allocated for cache, i.e. this memory is not available to the system.

Total 78GB of used memory.
The database can execute up to 500 (max_connections) requests in parallel. Each of them can use at least 768MB (work_mem) for sort/group/hash operations. “At least” because if the request requires several such operations, then 768MB will be eaten by each of them.
If we assume that queries are simple and use no more than one such operation, then with available memory (78GB), 104 such queries can be afforded. Those. the `max_connections` parameter must be equal to (78*1024/768) = 104 , or the system will go into swapping under this load.
If the set number of possible sessions is necessary, then the memory should be reduced to (78*1024/500) =160 , this is if there are no complex queries.
It seems to me that in those moments when the base is on its knees, such a situation occurs when the resources allocated to it exceed the physically available ones and the core goes into swap. This is if we exclude the possibility of any other processes in the system - you never know, manual VACUUM is launched at such moments ...
Monitor. First of all, the axle. Either with ready-made systems (zabbix and nagios are the most common in my practice), or use `vmstat` or `dstat` (I use them in 95% of cases). This will allow you to “see” what is happening in the system, how the memory and the I / O subsystem work, swapping.
Squeeze out more information from the database itself. At a minimum, I always enable the following options:
Yes - there will be much more information in the logs, but it's more convenient. If a specific database with problems is known (out of many), then redundant logging can only be enabled for it via `ALTER DATABASE ... SET ...` . The same applies to specific users . However, I prefer full logging at the level of the entire cluster.
I also recommend viewing (and possibly saving snapshots of) statistical tables, at least `pg_stat_activity` and `pg_locks`.
All of the following is my opinion, I highly recommend changing settings judiciously and preferably testing the new configuration through load tests.
In conclusion - I think you need to change the base parameters and get more information about what is happening. After that, it will be possible to talk about the situation further.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question