What to choose: SQL vs NoSQL?

Vitaly Zhuk2013-09-14 12:05:19

PHP

Vitaly Zhuk, 2013-09-14 12:05:19

Hello to all habrasociety.

Nevertheless, I decided to give life to my dream in creating one not a small project. And the question arises, what is better to use for Primary Storage?

At the moment, I'm thinking about PostgreSQL (MySql) or some kind of NoSql solution (like Redis).

The task of the project is to collect information from different sources at high speed.

But here, a bunch of nuances appear because of which I can’t decide:

1. Redis is still fast (I would say very fast), but it has a higher percentage of data loss when the server crashes than a relational database
2. Redis does not allow you to explicitly specify links between entities.
3. The database (MySQL) with a huge number of records (more than 20 million records are expected) starts to “dumb down” very much.

Requirements:
1. Fast write/read speed
2. Ability to replicate to other servers
3. There are almost no filters, in fact, only lists.
4. Permissible data loss up to 2%

Maybe someone faced such a task, and has experience. I would really appreciate any answers/advice. Thank you!

Answer the question

In order to leave comments, you need to log in

10 answer(s)

WEBIVAN, 2013-09-14
@ZhukV

1) I strongly doubt that 32gb ram will be enough for you for 20+ million records, for radishes
2) Radishes have AOF in which data loss is extremely unlikely
3) Radishes is fast not because nosql, but because the database is in ram
4) with correctly built indexes and the structure of the database for 20 million muscle does not blunt, we have 100 million in the table on one production project now and everything works fine. By the way, the table moved from the radish, when the RAM was no longer enough for it, after tuning the muscle, the performance did not suffer.
5) As it was written above, if you do inserts in batches, and not one at a time, this will significantly speed up the work
of the database. Comparable prices at OVH, with clearly better quality
7) In my experience, correctly configured SQL is sufficient in 99% of cases

Puma Thailand, 2013-09-14
@opium

20 million for mysql is nothing.

Kaigorodov Alexey, 2013-09-14
@rfq

Simple writing to files satisfies the requirements you listed. But you do not consider this obvious option, which means that it does not suit you. For what reasons it is not suitable - you are silent, but you want to get advice. Well, this leaves room for imagination. I advise you Java-Chronicle or MapDB - the fastest solutions.

Dmitry Guketlev, 2013-09-14
@Yavanosta

If you only need to read/write use postgre. She copes with these tasks quite well.

r1alex, 2013-09-14
@r1alex

MongoDB since version 2.4 is quite suitable for production. And replication is also excellent speed. We use it in a combat project. The volume of the database is 32 GB. Three replicas in real time. Only the master writes. Only slaves read (we have almost 20 times more read operations).
At first glance, it may seem that the absence of a master-master replica is not a gut. However, the database implements mechanisms for re-electing the master in case of a fall.

Sergey, 2013-11-27
@begemot_sun

Better think about what advantages SQL will give you over NoSQL. Now it is fashionable to talk about NoSQL, but these are just words. What will you do when the database schema changes? When will there be sampling needs that you didn't foresee in the first place? NoSQL is good where SQL solutions need help. As an independent primary solution, I think it should not even be considered.

niko83, 2013-09-14
@niko83

Up to 2% data loss allowed

One of 50 insert'ov falls, and it is comprehensible, I correctly understood?

The database (MySQL) with a huge number of records (more than 20 million records are expected) starts to “dumb down” very much

If the data structure is well defined and amenable to partitioning, perhaps it is possible to split the information into tables by months / weeks / etc. (depending on what kind of search will be performed) calling code and storage. (in this case, the caching layer can be easily implemented if necessary)
2. Write a test that generates a bunch of hypothetical read and write requests (if the project develops, it will come in handy)
(both points will give you valuable useful practical experience)
A large stream of inserts can be solved by buffering on client, and insert a bunch of lines in one command - Good optimization, think about it, maybe in your case this is quite acceptable.
I like it more as a primary storage relational, radish / mongo, etc. as an intermediate storage of aggregated information or for caching.

EugeneOZ, 2013-09-14
@EugeneOZ

In the first paragraph, you are very mistaken - with default settings, it is more reliable than Postgre.
The second point - yes, the difference from relational databases is radical.
In order for the data to fit in memory, multiple servers will be required (horizontal scaling). Database servers and applications should be separate.

IDVsbruck, 2013-09-14
@IDVsbruck

For dynamic data (structure, links, frequent updates) - RMDB, for static data (text, collections, etc.) - NoSQL. It is quite possible to make friends in one project.

KEKSOV, 2013-09-15
@KEKSOV

Look towards Percona Server + Percona NoSQL This is a long and well-known MySQL. Where you need complex queries for data analysis - use regular SQL queries, where you need speed - you access the same data through the NoSQL interface. Another bonus is master-master replication out of the box.