How to find the cause of the crash / slowdown of the server with PostgreSQL?

A

Alexey Nikolaev2019-07-23 21:46:07

PostgreSQL

Alexey Nikolaev, 2019-07-23 21:46:07

Good evening.
Situation: a very powerful server (8gb, new xeon) with a fairly loaded application, 10+ postgres threads, which load the system by 60-80% in total. At the same time, everything works smartly, they just work constantly. So everything lasts 2-3 months, but from time to time a client complains that something is hanging, I look - and it's true. /top has a couple of postgres processes with abnormally high CPU (both 40) and abnormally long execution times. I restart the server, everything is fine again for a couple of months.
I don’t sin on heavy queries, because, firstly, the query build is 100% given to ORM (Eloquent, Lumen), and the ability to write any count (*) with crooked hands is reduced to zero; and secondly, if the reason is in crooked queries, then it should hang periodically (during the execution of the actual query), and it should not get better after a restart. It feels like some rare bug or situation breaks the whole system every three months. If it happened all the time, then, perhaps, I would have already found it and corrected it, but here - xs, xs.
Deadlocks? This should not be, because. in 2k19, almost all database servers detect and kill deadlocks themselves, and postgres is no exception. And the standard script execution time (assuming that the operation initiated by the user is still the reason) should interrupt the execution, and there are wild values \u200b\u200bin tens of hours of execution.
How to find the reason? Where to look? I set it up like normal, everything is fine with memory too (/ free gives out that about 30% is free). While I'm thinking about restarting postgres by cron once a month, this would solve the problem, but this is a symptomatic treatment, and in general it is not good.

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

M

Melkij, 2019-07-23
@melkij

very powerful server (8gb

Yeaaaa? And I thought a dead virtualochka. Pretty average pieces of iron at 256GB then what?
Ha. Hahahaha. Good joke.
The most fierce nonsense instead of queries is done by ORM.
Twisted the logging settings, at the time of the problem, at least look at this: https://github.com/dataegret/pg-utils/blob/master/...
Install some kind of monitoring that can monitor the database.
And look at what exactly the base is doing and what the hardware is doing at the time of the problem. Especially I/O. process title would at least be saved / written. There and then the command tag is duplicated.