I
I
iamgook2019-08-16 22:19:28
Database design
iamgook, 2019-08-16 22:19:28

How to properly version records in the database and monitor changes?

Good evening.
There is a table in the database, into which data is loaded according to the schedule from a large dump file.
Each dump is atomic - that is, it contains all the records that need to be inserted at the moment, and all the records are taken from that one dump only.
For them, you need to save the versioning of each record and monitor their changes: the appearance of new ones, the change and deletion of old ones.
Now it's implemented like this:

  1. Two tables: main and history with the same structure;
  2. In main , the primary key is the record id;
  3. In history , the primary key is a hash of all fields in the table;
  4. The input data is processed and cleaned;
  5. Insert one record into the main table with on duplicate key update ;
  6. After inserting a row , mysql_affected_rows is taken , if 1 - the record is new, if 2 - the record has already been, but has changed;
  7. The hash from the data is considered, and the same line with the hash is inserted into history

As a result, main contains the actual records, history - the history of each change + the current record.
Is this the correct algorithm?
How, then, to track the deletion of a record - when it was in the previous dump, but disappeared in the current one?
So far, I see only this approach: for each entry in main , add the datetime field - the date the dump was modified / loaded. After inserting all records, make a second pass - selecting all records for which this field has not been updated - they will be considered deleted from the current dump, and the script will react accordingly.

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question