Answer the question
In order to leave comments, you need to log in
What to choose: SQL vs NoSQL?
Hello to all habrasociety.
Nevertheless, I decided to give life to my dream in creating one not a small project. And the question arises, what is better to use for Primary Storage?
At the moment, I'm thinking about PostgreSQL (MySql) or some kind of NoSql solution (like Redis).
The task of the project is to collect information from different sources at high speed.
But here, a bunch of nuances appear because of which I can’t decide:
1. Redis is still fast (I would say very fast), but it has a higher percentage of data loss when the server crashes than a relational database
2. Redis does not allow you to explicitly specify links between entities.
3. The database (MySQL) with a huge number of records (more than 20 million records are expected) starts to “dumb down” very much.
Requirements:
1. Fast write/read speed
2. Ability to replicate to other servers
3. There are almost no filters, in fact, only lists.
4. Permissible data loss up to 2%
Maybe someone faced such a task, and has experience. I would really appreciate any answers/advice. Thank you!
Answer the question
In order to leave comments, you need to log in
1) I strongly doubt that 32gb ram will be enough for you for 20+ million records, for radishes
2) Radishes have AOF in which data loss is extremely unlikely
3) Radishes is fast not because nosql, but because the database is in ram
4) with correctly built indexes and the structure of the database for 20 million muscle does not blunt, we have 100 million in the table on one production project now and everything works fine. By the way, the table moved from the radish, when the RAM was no longer enough for it, after tuning the muscle, the performance did not suffer.
5) As it was written above, if you do inserts in batches, and not one at a time, this will significantly speed up the work
of the database. Comparable prices at OVH, with clearly better quality
7) In my experience, correctly configured SQL is sufficient in 99% of cases
Simple writing to files satisfies the requirements you listed. But you do not consider this obvious option, which means that it does not suit you. For what reasons it is not suitable - you are silent, but you want to get advice. Well, this leaves room for imagination. I advise you Java-Chronicle or MapDB - the fastest solutions.
If you only need to read/write use postgre. She copes with these tasks quite well.
MongoDB since version 2.4 is quite suitable for production. And replication is also excellent speed. We use it in a combat project. The volume of the database is 32 GB. Three replicas in real time. Only the master writes. Only slaves read (we have almost 20 times more read operations).
At first glance, it may seem that the absence of a master-master replica is not a gut. However, the database implements mechanisms for re-electing the master in case of a fall.
Better think about what advantages SQL will give you over NoSQL. Now it is fashionable to talk about NoSQL, but these are just words. What will you do when the database schema changes? When will there be sampling needs that you didn't foresee in the first place? NoSQL is good where SQL solutions need help. As an independent primary solution, I think it should not even be considered.
Up to 2% data loss allowed
The database (MySQL) with a huge number of records (more than 20 million records are expected) starts to “dumb down” very much
In the first paragraph, you are very mistaken - with default settings, it is more reliable than Postgre.
The second point - yes, the difference from relational databases is radical.
In order for the data to fit in memory, multiple servers will be required (horizontal scaling). Database servers and applications should be separate.
For dynamic data (structure, links, frequent updates) - RMDB, for static data (text, collections, etc.) - NoSQL. It is quite possible to make friends in one project.
Look towards Percona Server + Percona NoSQL This is a long and well-known MySQL. Where you need complex queries for data analysis - use regular SQL queries, where you need speed - you access the same data through the NoSQL interface. Another bonus is master-master replication out of the box.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question