Scheme for storing changing data with history

V

vgrayster2010-10-03 20:21:18

PHP

vgrayster, 2010-10-03 20:21:18

There are about 300 thousand objects (for example, cars) for each car, parameters are measured once a week (mileage, tire pressure, fuel amount), there will be around 20 parameters, you need to store all this in the database.

In general, users are only interested in the last parameters. But sometimes it is necessary to answer questions like “How did the tire pressure change over time”, “What parameters changed last week?”

Intuition says that we should probably look towards mongo, but that task clearly says that we will use Mysql :)

So far, two options have been born

1)

The first table (name data)

id | object_name | param1 | param1_is_changed | param1_change_date | param2…

Second table (name data_history)

id| object_name | param1 | param1_is_changed | param1_change_date | param2… | version | change_date

Each time any parameter is changed, the previous version is written to data_history, the parameter that has changed has the is_changed checkbox

2) The first table (name data)

id| object_name

Second table (keeps only last values)

id | object_id | param_name | param_value | date

Third table (stores the history of values from the second table)

Now we are tracking about 50 thousand objects, about 200 changes in parameters occur per week. All parameters are numeric, so the issue of storage redundancy in the first case is only of concern in terms of database performance, but not disk space. The second method seems to be good, but it is not very easy to implement using the ORM.

Your opinion? how to design db? how to find a compromise between an efficient database and the convenience of writing an application to it.

Reply

Answer the question

In order to leave comments, you need to log in

7 answer(s)

G

Grigory Peretyaka, 2010-10-03
@Peretyaka

This topic has already been raised . Your first model is similar to TYPE 4.
Why is the param1_is_changed field? It is necessary to determine which field has changed, they do not change as a group?
Logic in the second method, yet, I do not see.
I think it will be possible to design in such a way that there would be no difference in performance when sampling.

A

Anton Korzunov, 2010-10-04
@kashey

I myself used option 2.
Strange as it may seem - very often choosing the right one is not so simple.
I spent a lot of time with groups and correct orders in order to select the latest data from a pile of heterogeneous material.
I ended up storing the history separately, and the last slice of data separately.
No problems at all, and operations with the main base have become easier and faster

M

MikhailEdoshin, 2010-10-04
@MikhailEdoshin

In general, a natural model, as far as I understand, will be like this:
Table 1. Vehicle (ID, Last Reading ID).
Table 2. Reading (ID, Vehicle ID, Date, and measured values: Fuel, Oil, Tire Pressure, etc.).
If it does not suit you for some reason, then move on to other models. So far, for me, for example, the advantage of storing heterogeneous values in one field is not obvious. Yes, these are all numbers, but if a non-numeric value is suddenly added, the model will have to be significantly changed.

P

pietrovich, 2010-10-04
@pietrovich

>The second method seems to be good, but it's not very easy to implement using ORM.
so, in mysql there are triggers for a long time, emnip. organize the collection of history by triggers on insert/update/delete, and the display of history can already be twisted as you like if you dance from a separate table (or a view that will stitch the actual and archived data together).

@

@mgyk, 2010-10-05
_

1) Table with data. Date, Object_ID, Parameter_ID, Parameter_Value.
2) The Current table is similar to the first one, only without dates, updating by a trigger, or recalculation by krone.
3) Divide the table with measurements by months, engine=ARCHIVE

P

pentarh, 2010-10-06
@pentarh

3D table a la Google BigTable