Answer the question
In order to leave comments, you need to log in
Help me choose the right database
Prompt please a DB possessing following properties:
Persistence is not required;
It is necessary to update almost all keys in the database every 3-6 hours (100M+ keys with a volume of 50GB);
Ability to distribute data by key (or by PK);
It should be a DBMS, not an embedded solution;
At the time of recording, the cluster of such a database as a whole should continue to respond to requests, individual nodes may be blocked;
Not in-memory - the amount of data will exceed the RAM;
Horizontal scaling and replication;
Normal support for frequent full overwriting of all data (interested in the issue of working with disk fragmentation);
Compatibility with C# and Java;
Optional but desirable to have auto key expire;
Our process of working with such a database:
An analytical cluster once every 4-6 hours produces data with a volume of 100M records (50GB), which are a key / array of 20 values, this data must be effectively distributed to users through the front end system. In general, a maximum of 15% of this data will be requested, the rest will simply lie for 4 hours and will be completely overwritten after the next cycle of the analytical system.
What have already been tried:
We used to use MongoDB, but the servers had weak screws and we had to switch to Redis, now the screws have been changed to very fast ones, but the volumes have grown a lot, so we think that it can still suit us. The following things are confusing in MongoDB: data storage overhead (BSON, perhaps these are unreasonable fears), disk fragmentation and the expensive price of defragmentation (perhaps there is a way out), I did not find the ability to quickly overwrite all data so that I had to fill the entire array again and then delete old.
What database would you recommend to choose in our case?
Answer the question
In order to leave comments, you need to log in
Do you want it for free, or to work and not ask for food?
In general, almost any base, 50g is little things that can be threshed on mysql / innodb.
And yes, it's cheaper not to overwrite the data, but to write continuously, and then drop the entire old sections. Therefore it is desirable with normal implementation of sectioning.
If there is no money, then postgres.
If the paid Oracle
Free will definitely go both mysql and postgres, I personally would choose postgres.
Couchbase (where map reduce and self-updating views are useful for analytics).
memcached (or analogues, at least mysql) and a lot of memory is the cheapest solution. You just haven't described the issue at all. It is really not clear why the cluster and how to update 100M + keys in three hours.
um, if about 50GB of data, then take at least 128 gigs of ssd - there will already be a bad performance boost in any database. you just need to calculate how much it will last in order to pre-order a new one.
I still don’t understand why Monga doesn’t suit you. It also has simple and efficient scalability - a key feature.
With an overhead for data storage, you are in vain. Firstly, it is available almost everywhere, and secondly, storage is now very cheap.
Regarding fragmentation - can a BSON document be divided into fragments? As far as I remember, this is not allowed in the default monga configuration.
Thank you for your responses.
I'll think about PostgreSQL, but it's difficult with scalability.
SSD - a server has already been ordered for a test with them
MongoDB is now really a favorite in the personal database chart for the task. MongoDb fragmentation, I meant that when I upload 50GB of data twice to the collection and then delete 50GB once, then for some reason significantly more than 50GB is occupied on the screw (in fact, more than 100GB is occupied there)
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question