X
X
xxx44yyy2021-01-02 13:17:11
PostgreSQL
xxx44yyy, 2021-01-02 13:17:11

Where to dig in the direction of finding an error with a replica?

There is one master with several replicas. Everything seemed to work well, but then from time to time users on the site began to receive an error, something like "there is no connection to the database." Decided to see what happens.

I tear off the postgres master log, and there:

2022-01-01 23:56:02.501 MSK [4416] [email protected][unknown] ERROR:  replication slot "japan_replica" is active for PID 3006
2022-01-01 23:56:07.505 MSK [4417] [email protected][unknown] ERROR:  replication slot "japan_replica" is active for PID 3006
2022-01-01 23:56:12.505 MSK [4419] [email protected][unknown] ERROR:  replication slot "japan_replica" is active for PID 3006
2022-01-01 23:56:17.510 MSK [4457] [email protected][unknown] ERROR:  replication slot "japan_replica" is active for PID 3006
2022-01-01 23:56:22.507 MSK [4459] [email protected][unknown] ERROR:  replication slot "japan_replica" is active for PID 3006
2022-01-01 23:59:17.635 MSK [4954] [email protected][unknown] ERROR:  replication slot "japan_replica" is active for PID 3006
2022-01-01 23:59:22.642 MSK [4957] [email protected][unknown] ERROR:  replication slot "japan_replica" is active for PID 3006
2022-01-01 23:59:27.639 MSK [4958] [email protected][unknown] ERROR:  replication slot "japan_replica" is active for PID 3006
2022-01-01 23:59:32.643 MSK [4997] [email protected][unknown] ERROR:  replication slot "japan_replica" is active for PID 3006
2022-01-01 23:59:37.652 MSK [4998] [email protected][unknown] ERROR:  replication slot "japan_replica" is active for PID 3006
2022-01-01 23:59:42.652 MSK [4999] [email protected][unknown] ERROR:  replication slot "japan_replica" is active for PID 3006
2022-01-01 23:59:47.661 MSK [5037] [email protected][unknown] ERROR:  replication slot "japan_replica" is active for PID 3006
2022-01-01 23:59:52.667 MSK [5039] [email protected][unknown] ERROR:  replication slot "japan_replica" is active for PID 3006
2022-01-01 23:59:57.658 MSK [5040] [email protected][unknown] ERROR:  replication slot "japan_replica" is active for PID 3006


And there are thousands of such messages. In the japan_replica replica, I open the database log and it is empty for this date (there are messages in previous days).

I do not understand where to dig in search of an error. Feeling that after ERROR: something is missing.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
M
Melkij, 2021-01-02
@xxx44yyy

The message is self-sufficient, nothing is omitted.
You have a replication slot called japan_replica. Some host configured as a replica with primary_slot_name = japan_replica every wal_retrieve_retry_interval (5 seconds) tries to continue replication by connecting to primary_conninfo. The server specified in primary_conninfo replies "comrade, you're confusing something, this replication slot is already in use."
One replication slot = only one reader.
Look for an error in the configuration. Maybe they brought a second replica to the wrong slot.
The pg_stat_replication, pg_replication_slots views and adding %h to log_line_prefix will help clarify what's going on.
It has nothing to do with "no connection to the database".

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question