Are checkpoints occurring too frequently?

W

Webber2021-03-22 11:53:22

PostgreSQL

Webber, 2021-03-22 11:53:22

Postgres logs show the following messages:

LOG: checkpoints are occurring too frequently (9 seconds apart)
HINT: Consider increasing the configuration parameter "max_wal_size".
LOG: checkpoints are occurring too frequently (13 seconds apart)
HINT: Consider increasing the configuration parameter "max_wal_size".

Periodically there are strong iowait delays.

Server: 6cpu, 16memory, 320ssd

Config:

shared_preload_libraries = 'pg_stat_statements'		# (change requires restart)
logging_collector = on		# Enable capturing of stderr and csvlog
log_directory = 'pg_log'		# directory where log files are written,
log_filename = 'postgresql-%Y-%m-%d.log'	# log file name pattern
log_truncate_on_rotation = on		# If on, an existing log file with the
log_rotation_age = 14d			# Automatic rotation of logfiles will
log_rotation_size = 0		    # Automatic rotation of logfiles will
log_min_duration_statement = 10000	# -1 is disabled, 0 logs all statements
checkpoint_timeout = 30min
max_wal_size = 8GB
effective_cache_size = 6GB

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

M

Melkij, 2021-03-22
@AKLZephyr

The warning is displayed by the checkpoint_warning setting. Indicates that you are creating so many wal that checkpoints are triggered more often than the value of this warning. This is exactly the advice of the base to take a closer look at the configuration and workload.
Figure out exactly what you're generating wal with and maybe rethink the process. See what usually works for you at the time when these warnings are triggered. Some kind of bulk data imports, something else that writes a lot of data.
You may need to increase your max_wal_size even further. But keep in mind that the more wal between checkpoints, the longer the base will start after an emergency shutdown, because. should roll all the wal from the last checkpoint.
This is the tradeoff between the start time of the base during a crash and the amount of IO during normal operation. The checkpoint is quite a lot of IO.
PS: ask the base "show max_wal_size;" whether the setting has been correctly applied.