How to configure PostgreSQL PITR?

S

Solitudine2022-02-11 08:59:19

PostgreSQL

Solitudine, 2022-02-11 08:59:19

Hello. Let me know if anyone knows and has done something similar.
There is a base that has a replica connected in a physical slot.
Snapshots of the entire disk are periodically taken from this replica (automatically according to Google's schedule).
I'm trying to crank out Point-in-Time Recovery by deploying add. The virtual machine, having picked up a snapshot of the replica as a disk, additionally downloaded wal-logs from the master to a separate directory and set up recovery_conf indicating this directory and the time for which you need to restore.
But this does not work on its own, the base either goes to fatal or panic. Error type

PANIC: could not locate a valid checkpoint record

Or

FATAL:  recovery ended before configured recovery target was reached

Can you tell me if this option that I want to crank is generally workable, and if so, what am I doing wrong. Or is the whole process built incorrectly, and thus PiT does not crank?
The log itself from the moment of launch with an attempt to restore looks like this:

2022-02-10 13:31:29.106 UTC [1515] LOG:  starting PostgreSQL 13.5 (Ubuntu 13.5-2.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
2022-02-10 13:31:29.107 UTC [1515] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2022-02-10 13:31:29.109 UTC [1515] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-02-10 13:31:29.115 UTC [1516] LOG:  database system was shut down at 2022-02-10 13:31:25 UTC
cp: cannot stat '/tmp/wal_backup/00000002.history': No such file or directory
2022-02-10 13:31:29.118 UTC [1516] LOG:  starting point-in-time recovery to 2022-02-10 07:00:00+00
2022-02-10 13:31:29.134 UTC [1516] LOG:  restored log file "0000000100000011000000B2" from archive
2022-02-10 13:31:29.349 UTC [1516] LOG:  invalid primary checkpoint record
2022-02-10 13:31:29.349 UTC [1516] PANIC:  could not locate a valid checkpoint record
2022-02-10 13:31:29.564 UTC [1515] LOG:  startup process (PID 1516) was terminated by signal 6: Aborted
2022-02-10 13:31:29.564 UTC [1515] LOG:  aborting startup due to startup process failure
2022-02-10 13:31:29.565 UTC [1515] LOG:  database system is shut down
pg_ctl: could not start server
Examine the log output.

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

M

Melkij, 2022-02-11
@Solitudine

For pitr, you need:
- basebackup as a basis, its removal must be completed before the date of the required restore
time

database system was shut down at 2022-02-10 13:31:25 UTC
starting point-in-time recovery to 2022-02-10 07:00:00+00

This is of course not possible even with a valid WAL archive.
Postgresql has REDO recovery, not UNDO. Pitr just forward from the current position. And not before reaching the state consistency point (the end of basebackup removal)
In order not to go through the rake of file system consistency, it is preferable to remove basebackup from postgresql itself, and not from a block device snapshot. But in general, the option provided is indistinguishable from the usual crash recovery, such as starting after a power outage (if only fsync worked correctly at all levels and was not ignored).