Answer the question
In order to leave comments, you need to log in
How to optimize the check in the database for the presence of a record (so as not to make a duplicate)?
There is one table (about 15 fields) in which now about 50 million records. Every day, about 100 thousand more records are written to this table. At the same time, when writing to a table, you have to check for the presence of such a record in the database so that there are no duplicates. All of this happens within MySQL. The index is built only on some fields, which are then searched. And the check for a double occurs in all fields, which, accordingly, is the longest operation.
Tell me, knowledgeable people, how can I get rid of this bottleneck? Can I be saved by leaving MySQL for something else, maybe even NoSQL? Or just an index on all fields?
Answer the question
In order to leave comments, you need to log in
Add another index field with md5 (all table data) and compare hashes when checking for duplicates.
SELECT 1 FROM table WHERE hash='5c331a6790ba2d61a5c372336c9d215e'
Eee, as if a unique key was created for this, the only thing that is alarming is that all fields must be unique for you, perhaps the base is poorly designed. Maybe, with your volumes, this solution will not work, but at least it is obvious and it is worth starting with it.
The answer to your question will depend heavily on why you are checking for the presence of an existing record? If, in order to do INSERT if there is no record and UPDATE if there is already a record, then perhaps you should use the ON DUPLICATE KEY construct, thus you can firstly assign the check to the database, secondly, get the opportunity to pour data in a batch, and thirdly, remove an extra overhead appearing from a recording performance.
there must be at least one key field. Just insert a record and check the result of the operation for an error - in this case, the record already exists. Or just ignore the error.
PS: although I am a developer, I would have done just that, but I'm not sure that this is a conceptually correct solution.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question