A
A
Alexey2020-04-18 11:10:07
PostgreSQL
Alexey, 2020-04-18 11:10:07

How to find the cause of "Exception: idle transaction timeout" in a Flask+psycopg2+postgresql+pgbouncer application?

There was such a problem. Once a month, the web application stopped with a 502 error. The reason was that postgresql stopped processing requests because the limit on the number of connections was exceeded. The list of processes showed a bunch of open pending connections that were being blocked by some request. Moreover, the request was quite ordinary, which was usually performed instantly. The reason that led to such a blocking was not enough to find out the mind. But there were several "idle in transaction" processes, and after killing one of them, everything was restored.

Postgresql 9.5 does not yet have an option to set a timeout for hanging transactions. Therefore, it was decided to install Pgbouncer.
As a result, there is postgresql, in front of it is pgbouncer in pool_mode=transaction.

For a couple of weeks everything worked great. But during the last two days, suddenly, for no apparent reason, a problem arose. The web application returns 500 to the user, the user is surprised, tries again, everything works fine again. A few minutes later, in another place in the application, the situation repeats itself. The user is nervous.

In the application log (Python, Flask, psycopg2):

Exception on /any-url-of-the-application [POST]
...
...
Exception: idle transaction timeout
server closed the connection unexpectedly
  This probably means the server terminated abnormally
  before or while processing the request.


I look at what requests it crashes - on any that, in a normal situation, work out instantly.

In pgbouncer.ini:
idle_transaction_timeout = 600

Server load - 10-20 connections per second. There is nothing suspicious in the system logs, in the Postgresql and Pgbouncer logs. The process list does not show the presence of postgres processes with pending transactions. What kind of system bursts leading to a short-term exhaustion of resources that can lead to such a specific error - I can't imagine.

That's strange. If such an error occurred in service scripts that perform long regular tasks in cron, it would be clear where to dig. And here - a simple user, works for himself in the browser, and suddenly after the next click on the button - error 500! idle_transaction_timeout = 600 is 10 minutes. The user gets 500 instantly. Considering that it creates its own connection in the context of the Flask application every time the page is refreshed, how can it suddenly fail due to idle transaction timeout?

What could it be?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
M
Melkij, 2020-04-18
@melkij

and after killing one of them, everything was restored.

let's hope it doesn't kill -9
Start by checking that your application does indeed correctly reopen a connection dropped by a base or bouncer. And does not leave it in the pool forever, giving the saved error text.
By the way, you didn't name the version of pgbouncer. Perhaps you put the bouncer as old as the base. I remember a possibly relevant fix in 1.10

D
Dimonchik, 2020-04-18
@dimonchik2013

have you studied everything here?
Or is it inapplicable to your version of Postgre?
it is obvious that the matter is in the settings of the base / pool

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question