Database normalization. Evil or good?

Anton2015-08-21 18:10:58

Database

Anton, 2015-08-21 18:10:58

Good afternoon!
Recently I watched an interesting presentation of a person who implemented his startup to a full-fledged project. In this project, he used a Postgres database. Everything was interesting, of course, but he said one thing that shocked me. He said this: "Normalization is evil! In some cases (this percentage is small) and if necessary, use database normalization." I have never done projects related to querying data in a database or working with a database in general, but I studied the theory on this issue (at the university). And I had a quite reasonable question: is database normalization really not the best ally in the implementation of your project (program)?

Answer the question

In order to leave comments, you need to log in

6 answer(s)

Artem Voronov, 2015-08-21
@MoonMaster

It strongly depends on the scale of the project. Do not rush from one extreme to another.
For smaller projects, normalization makes the job easier.
But when it comes to huge amounts of data and high request rates, normalization becomes the enemy. Even simple JOINs increase the query execution time by an order of magnitude. Therefore, denormalizing the database, introducing redundancy is a common practice for HiLoad.

Sergey, 2015-08-21
Protko @Fesor

It all depends on the context of the statement (tasks are different). Going to extremes is stupid (only the Sith elevate everything to an absolute (c) Obi)
Normalization is not evil, normalization is wonderful, but at high loads it is often necessary to denormalize. Again, this usually does not mean that we no longer store data in normal form, we simply form aggregations of denormalized data along the way to speed up sampling.
ps specify what the project was about or throw off the link, it's curious to look at the madman or understand him at least.

Stanislav Makarov, 2015-08-21
@Nipheris

You see what's the matter - normalization is not so much for the DBMS and not for the application that uses it. It's more for you, for the developer (DB and/or application), and for data integrity and consistency.
Of course, in production, the ideal database is the one whose structure allows you to perform the most frequent (or heavy) database queries as economically as possible in terms of hardware resources (less reads of blocks from disk, fewer joins, the most efficient use of indexes), and such a structure is not necessary at all. must be in high normal form. However, there is another question - which database is ideal for you as a developer? Redundancy in the database - the potential for errors. If any information needs to be updated in two places (for example, the price of the goods in the receipt and the total cost of the order) - you always have the opportunity to forget to do this somewhere.
It is precisely because of the discrepancy between the two database structures: the most normalized, convenient for the developer / designer, and optimized for query execution - the standard database design cycle includes the stage of normalization to a certain level (at least up to 3nf), and subsequent denormalization to speed up specific queries ( as well as building the necessary indexes). Because denormalization requires complicating the logic of working with the database (the same updates in several places), this logic (most often these are storages or triggers, less often on the application side) must be implemented as accurately and formally as possible. It's like writing code in a high-level language and then compiling it for a specific platform with maximum optimizations. The only important difference is that the specifics of the target platform are known in advance, and the compiler,
It should be noted that in modern systems schema denormalization is not the only and not always the best way to improve performance. Caching frequently used data in some memcached is sometimes easier and more efficient than denormalizing the database and maintaining its consistency.

Maxim Koryukov, 2015-08-27
@maxkoryukov

Let's figure it out.
In a normalized database (in the general case), sampling by arbitrary parameters will work faster. Easier to add and modify data.
An unnormalized one can give a very big gain for a clearly defined range of queries on data sampling (extra joins are not needed, sorting is already ready, etc.). But if you need something non-standard - get ready for any outcome. Most likely the request will work very slowly.
In a normalized database, the query can be optimized to a certain extent. I've seen a (very large) stored procedure go from 2 hours to 5 minutes with the right indexes and the right johns, but it was SQL kung fu sensei who did the optimization, he knew what to do.
In denormalized - you won't do that anymore, you will have to change the data structure.
I would try to normalize the data, but not to the point of academic fanaticism, but to the extent that tables do not disgust either me or the query optimizer of the DBMS used.

@coodan, 2015-08-23
_

I agree. Very bad practice, and growing it doesn't prove anything.
Tea, we do not live in the Stone Age and there are a lot of ways to improve performance - from caching to pre-aggregation of data. But no performance gain is worth the loss of data integrity. Why is it more productive to give out garbage?
The dude just heard the ringing, but does not know where he is. I took a powerful DBMS, but it was not enough to master the brains. So now he boasts of his ignorance, believing that he is the smartest, and the rest are all fools.

Ytsu Ytsuevich, 2015-08-21
@kofon

If you don't have work experience yet, then make it appear.
And now you can make a premature conclusion, and perhaps it will be erroneous.
And at the expense of evil is good, I don’t know ...
IMHO, depending on the extent to which normalization is carried out