What is the best way to proceed when a huge amount of data is planned?

A

Alexey Nikolaev2018-10-19 21:31:55

PostgreSQL

Alexey Nikolaev, 2018-10-19 21:31:55

Hello.
There is a PostgreSQL DBMS. There is also the task of collecting statistics from the social. networks according to certain parameters (fb, tw, ig, etc.) and then calculate certain data. In the future, this is hundreds of millions, even billions of lines, because the service is great and serious.
What is the question: a colleague suggested making one denormalized statistics table for each social network, arguing that there would be a lot of data and it would be better to divide it into several parts. That is, for example, the facebook_stats table . I advocate a normalized and more complex approach: a stats table with a type field, and a couple of other fields for the main data, to which tables with data specific to a particular social network will then be attached according to the one-to-one scheme.
In my opinion, the second solution is more beautiful and architecturally more flexible, it will be more pleasant and easier to work with it in code. Stops the load issue: if you do as a colleague suggested, the tables will be 4 times smaller than my stats table. But on the other hand, only the main data will be stored in my table, and additional data, if necessary, will quickly be pulled up due to foreign keys.
Which option is better to choose when really big data looms? How are they stored and processed by large corporations? It needs to be beautiful and fast.
Thanks in advance.

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

S

Sergey Gornostaev, 2018-10-19
@Heian

Big data is measured not by the number of rows in tables, but by the volume of this data. And until you have petabytes, they are not large . And the highload you mentioned in the tags is about the number of requests per second, and not the number of rows in the tables.
The structure of the database must be selected according to the structure of the queries. For a flat select on indexed fields, 20-30 billion rows in a table are not a problem.