M
M
Mikhail Yurievich2016-10-08 14:08:32
PostgreSQL
Mikhail Yurievich, 2016-10-08 14:08:32

PostgreSQL - how to archive old records in a large table?

There is a table:

  • 100 million records
  • every day an increase of 1-2 million records
  • hot data - last month

At the moment, the database is spinning on SSD disks, we are looking for an option for optimal archiving of old records, while the following solution emerges:
  • we raise the second server on large HDD disks
  • by krone we add up old records
  • change the application code - add queries to 2 databases based on the required date

The last point is the most confusing, is there a transparent solution for sending 2 requests to different databases and automatically merging the result?
Maybe there is a better approach to this problem?
What to do when there is more data and you need several servers for archiving?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
M
Melkij, 2016-10-08
@Forbidden

How to split a table, leave hot data on SSD, cold data on HDD. To do this, first partitioning to split the table into two. https://habrahabr.ru/post/273933/ (as usual, pay attention to the comments and pg_partman)
Then, before data migration (or immediately when creating partitions), transfer the archive files to another tablespace www.postgresql.org/docs/current/ static/sql-createt... stackoverflow.com/a/11228536 on HDD.
Then data migration to partitions.
Actually, that might be enough. 1-2 million lines * 365 days is not too much. Although the nature of the data is not specified.
Transparent transfer of tables to another piece of hardware for the application - FDW, foreign data wrapper. The more up-to-date postgresql, the better. The thing is sawn very actively in terms of the optimal distribution of the request. Whether he is already friends with partitioning - honestly, I don’t know.
Transparently send a request to two databases and glue - elementary view with union all from the local table and FDW. Only this is an uninteresting option, why pull the cold part of the database to request hot data?
Additionally, you can look towards postgresql-xl, greenplum . The first one and a half years ago was not quite production-ready, now I don’t know, the second one is used even in the banking sector, but as I remember it is catastrophically unsuitable for OLTP, only OLAP load.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question