C
C
Caretaker2019-06-20 18:40:10
PostgreSQL
Caretaker, 2019-06-20 18:40:10

How to optimize data storage?

Greetings.
Having not achieved sane behavior from Mongo, it was decided to migrate to postgre.
The essence of the project is the collection of information from various sources, partly parsing, partly the users themselves, partly with the help of human resources. The volume of "records" at the initial level is expected to be ~50 million. Each data group has 5 to 10 fields.
Hence the question of how best to build a data storage structure, so far there are only two options in my head.
1. Store everything in one large table of the form:

ID / уникальное_поле / группа_полей_1 / группа_полей_2 / группа_полей_3

Accordingly, all selections are made for specific field values, all data is in its "original form" and is obtained by passing through only one table, but large (the number of fields is more than 30) and with a large part of NULL data.
2. Split into 4 tables. the main one, in fact, only as a complete list + additional fields for the most active selections, the rest of the data by groups:
- ID / уникальное_поле / данные_для_выборок
- ID / уникальное_поле / группа_полей_1
- ID / уникальное_поле / группа_полей_2
- ID / уникальное_поле / группа_полей_3

Taking into account the fact that to build the final picture of each record, you still need to get all three groups and, based on their values, get the final one, but each table stores only the available data, and not all empty ones, and the number of fields in each table is no more than 15. Only the "main table" with fields for selections is large in number.
For example for:
- уникальное_поле = aaaa
- группа_1_поле_A = xxx
- группа_2_поле_A = yyy
- группа_3 - нет записи вообще

and as a result, you need to give the result by value
- уникальное_поле = aaa, группа_2_поле_A = yyy
3. A mutated version from No. 2, when the main table maintains "final calculated fields" that change when the data in the "child tables" changes:
- ID / уникальное_поле / итоговая_группа_полей
- ID / уникальное_поле / группа_полей_1
- ID / уникальное_поле / группа_полей_2
- ID / уникальное_поле / группа_полей_3

A large "main table" is obtained with fields with summary data, but again there will be no more than 15 of them against more than 30 in the first version. And for normal selections, it is not required to refer to other tables and make a gluing / selection.
PS. The ID qualification is the record ID field for the primary key. A unique index is essentially a single end-to-end one, it is also a link key - a string, it cannot be converted into a number

Answer the question

In order to leave comments, you need to log in

1 answer(s)
C
Caretaker, 2019-07-01
@zuart

In general, through the spent nerves and trials, we revised the scheme of working with the base. At some points it got worse, at others it got better. But as a result, all the data was split into several tables.
- one common with the "final" data and all reading selections work on it
- three tables of intermediate data with which they work for editing and some specific selections
- one service format table, in principle, only automation works with it
- well, a little triggers and stored ones, which actually implement the atomic logic of working with data
In principle, it turned out optimally both in terms of working with data and in terms of resource intensity ... I consider the question closed =)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question