Archiving records from the database, after a certain period of time. How to do it better?

B

Boris the Animal2016-02-16 13:38:41

SQL

Boris the Animal, 2016-02-16 13:38:41

Suppose there is a database into which the server merges data from a large number of similar sources. For example, from nightclubs financial information about purchases through the checkout is not important, just an example.
There is a Payment table in the database. A very large number of entries from 1000 clubs can merge into this table per day. We have a condition that we need to send data from this table to the archive server after 2 years. Here is how best to organize such tables so that it does not slow down. After all, the data will be periodically somehow checked whether there is something for replication to the archive today. Yes, in principle, you can check only 1 time per day.
What I think:
Option 1 :
A table with dates is created. The table with dates stores the date with an accuracy of 1 day. This table contains the date and identifier. In the Payment table, add a CreationDateId that matches the date in that first table with dates. Thus, we will save not much on each table, but not a little, and we don’t need the exact date the record was created, a day is enough. As a result, we will have one table Payment, which contains a huge number of records.
Option 1 - problems:
When searching for records that need to be merged into an archive and when deleting replicated data, there will be brakes. Looks like the option is gone? Or is it possible to do something even in this variant so that it will work well?
Option 2:
We still create a table in the database with dates and CreationDateId in each record from the Payment table and others, as described in option 1.
Next, we create a table that stores the names of tables that have not yet been replicated.
Payment_Tables_NR (NR -Not Replicated).
Payment_Tables_Del
In Payment_Tables_NR we put the name of the table that corresponds to today:
Payment_16_02_2016, we write all the data received on February 16, 2016 to Payment_16_02_2016 and every day in the same way. In Payment_Tables_NR we put the name of this table. Then, when we need to find out which records need to be replicated, then we check the dates from the table names from Payment_Tables_NR. When we have replicated what data we need, we delete the name of the table from Payment_Tables_NR, the data from which is replicated. We put this name in Payment_Tables_Del. Further, at some period of time, a trigger in the DB fires and deletes all tables from the DB that are in Payment_Tables_Del and then deletes the names of these tables from Payment_Tables_Del.
Or is there an even better solution?
Or maybe there is somewhere the source code, where this scheme is implemented very competently?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

R

res2001, 2016-02-16
@res2001

In the table with your records, enter the CreationDate field and fill it in when adding records. An index on this field and a search in the database of old records will work out very quickly without much strain on the server.

D

Dmitry, 2016-02-16
@dmtrrr

Table partitioning can help you.