S
S
Shurik2020-08-05 00:06:49
PostgreSQL
Shurik, 2020-08-05 00:06:49

Why is the data size too large after adding data to the database?

Hello. Deployed an environment with Postgres in docker. I did an import of data from a dump weighing about 8 GB, as a result, the size of the database became about 13 GB, if not the memory is not mistaken. Then it was required to update the data to the actual ones from xml files. I delete means old records in tables and php I parse xml files and I interpose into the tables corresponding. By the number of records of new data, well, at most a quarter more than it was, and the database began to weigh 52GB instead of 13GB. But why such a difference in the size of the resulting base and the one that was with the old ones? What can influence the size?

UPD: I am updating the data. For example, in its original state, the largest tablet contained 50 million records and weighed 12 GB. After the update, I have 75 million records and a weight of 26GB. Although following the logic should be about 18.

And after making a request for the sizes of the plates in the console, I received the output, in addition to the plates themselves, also the data: 'nameTable_nameField_idx' and 'nameTable_nameField_pkey', which also weigh so well, 6GB each. The first I don’t know what, and the second, judging by the name, is the primary key. Does it really take up so much space, that is, applicable to the 26GB plate described above, the primary key occupies a little less than a fifth of the table itself ....

Answer the question

In order to leave comments, you need to log in

2 answer(s)
M
Melkij, 2020-08-05
@svisch

I delete means old records in tables and php I parse xml files and I interpose into the tables corresponding.

Well, quite naturally you get a double growth of both the table itself and all indexes. That's right.
Why? Because MVCC. Deletion does not delete data, because someone may still want to read them from older transactions, but only marks xmax - the id of the transaction from which the records cease to be visible. The space occupied by the deleted rows can then be reused for new data after a manual or autovacuum vacuum pass.
Indexes, of course, also take up space. And for the same reason, indexes in the delete option of the entire table + insert will take up twice as much space.
When you want to delete everything from a table, you don't need delete, but truncate.

I
Ivan Shumov, 2020-08-05
@inoise

Indexes certainly take up space)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question