What database to use for the project?

C

Caretaker2018-08-23 12:43:21

Database

Caretaker, 2018-08-23 12:43:21

Hello local people.
In general, after collecting information, trial and error using third-party logging systems, it was nevertheless decided to try to collect something more suitable on our own. At the moment, the following has happened:
- one logger instance can receive data via any of the configured channels: UnixSocket, WebSocket, HTTP
- Mongo was used as a database for entering information for tests in its basic configuration
- the maximum speed of receiving and entering incoming information into the database was obtained through UnixSocket - about 100 thousand records per second (and there is every reason to consider Monga's "bottleneck")
- the client library in UnixSocket in peak tests managed to "cram" up to 500 thousand events per second
In principle, the speed in normal mode will be much lower, I think it will not reach 20 thousand per second (up to 50 thousand peak loads). But the following questions arose that needed to be addressed (perhaps some of my assumptions will be noobish)
: if you store all the data in one collection, it will grow to incredible sizes. We are considering the option of "daily" storage of information - at 00:00 just rename the working collection to daily. But here doubts arise, is it possible then to work effectively on several collections with selections / filtering / merging?
2. If Mongo is abandoned, then which RDBMS is able to "take in" about 100 thousand records per second? This is not about inserts, but about the records themselves, because. the insertion is not done "line by line", but in blocks, the size of which depends on the amount of incoming information (if it is small - in small portions, if the flow is higher - the portions are larger).
3. Maybe some mixed version can be implemented, with a system of transfer and "normalization" from Mongo to RDBMS. But here comes the question of efficiency. If the logger service writes to Mongo, and a certain carrier once a minute takes everything from Mongo and stores it in the RDBMS, will it manage to transfer everything that will be accumulated in Mongo in a minute, even if it is taken in large blocks? Well, plus there will be some delay with the display of incoming information, and sometimes you need to "real-time" monitor the flow of information from some of the clients and immediately display it in the guide. We'll have to shamanize with the interprocess interaction of the logger and the GUI.
I would be grateful for any advice / suggestion on the merits.
PS. Criticism, of course, is also welcome, but within the bounds of decency =))

Reply

Answer the question

In order to leave comments, you need to log in

5 answer(s)

D

Dimonchik, 2018-08-23
@zuart

see below

R

Roman Mirilaczvili, 2018-08-23
@2ord

Cassandra works well in bulk write mode. CQL instead of SQL.
https://jaxenter.com/evaluating-nosql-performance-...
https://dzone.com/articles/efficient-cassandra-write
There is also ScyllaDB compatible with it.
Although this may be overkill.

P

Philipp, 2018-08-23
@zoonman

1. MongoDB will create a million records per second for you, use sharding and you will be happy.
2. Using MongoDB to store logs is a so-so solution because it was not invented for this.
3. Look towards the ELK. Don't like it, there is https://prometheus.io/docs/introduction/overview/ + Graphana. There is also graylog.
There are many options.

S

Stanislav Bodrov, 2018-08-25
@jenki

Mongo, of course, quickly accepts records, but then working on it with selections will probably be very problematic

Yep, blocking stuff.

which of the RDBMS is able to "take in" about 100 thousand records per second?

Correctly configured MySQL. I repeat, not MariaDB, but MySQL.
Back in version 5.7, you could make it work almost like a NoSQL database.

If the logger service writes to Mongo, and a certain carrier once a minute takes everything from Mongo and stores it in the RDBMS, will it manage to transfer everything that will be accumulated in Mongo in a minute, even if it is taken in large blocks?

Never found such a carrier. Unless you write yourself.

I would be grateful for any advice / suggestion on the merits.

arangodb

S

s4kro, 2018-08-28
@s4kro

The best open source solution, in my opinion, for this task would be to take an elastic stack + RabbitMQ / Kafka (flexible filtering + a cute web face are attached).