B
B
beduin012021-11-08 11:36:16
PostgreSQL
beduin01, 2021-11-08 11:36:16

Indexing has been going on for 2 weeks, what's wrong with me?

Uploading a PostgreSQL dump to PG14.
The dump size is 60GB.
There are 25 tables in the dump.
Each table has 1 to 7 indexes.
One of the places uses a composite index.
Several hash indexes -- the rest btree The

maximum size of one of the tables is about 100 million records.

The problem is that indexing has been going on for 14 days and judging by pg_stat_progress_create_index it will continue for another 15 days.

It seems that I tweaked all the configs that can affect the indexing speed (before starting the upload).

Xeon server 6 cores. 16GB RAM.

HDD disk. The data loaded in 5 hours.
Indexes are created synchronously.
For some reason, the processor is only 25% loaded.
Windows OS.

Config:pastie.org/p/585DJwfj6rstQwukSQjhtT

Answer the question

In order to leave comments, you need to log in

1 answer(s)
R
rPman, 2021-11-08
@beduin01

means the bottleneck is almost certainly a drive.
A finger to the sky, what is the file system on which the tablespaces lie? by any chance not cow (btrfs/zfs/xfs)? databases work with them disgustingly, since frequent writes to a file generate strong fragmentation. In this case, before heavy processing, at least defragment the base files and disable the cow feature on tablespaces. A
good tuning can be (optional):
tablespaces, but the software version must match up to the last digit)
* place the entire database on ssd (even if it is consumer and cheap)
* add a cache for hdd to the ssd system using, for example, bcache (enabled for recording), although this may be of little use for linear processing of the database, but in general this is a good way to increase performance by an order of magnitude for cheap (in one place I used the virtualbox feature with snapshots to a file, kvm also has it, when subsequent writes went not to the original image but to another disk, and it is ssd)
* place the tablespace for indexes (and maybe each table separately) on another physical device (hdd, ssd or even in ram ), the size requirements here are usually low, the key word is to exclude sequential reads / writes to one device.
* place a journal (for example, ext4) on an ssd disk (a couple of gigabytes is enough) or even turn it off in hardcore (very dangerous, you can get messy data from the data during a power failure, but as a temporary solution while a long operation is in progress, if you have all the backups, it is justified) - the least optimization, but with frequent small recordings it is noticeable

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question