What DB to use for fast changing data?

G

GilbertAmethyst2018-06-30 04:13:19

Database

GilbertAmethyst, 2018-06-30 04:13:19

Hello!
Task:
More than two thousand devices connect to the NodeJS server via Socket.IO and transmit data at intervals from 5 to 40 seconds. The data needs to be updated for each device in some kind of storage + keep a history (limitation interval: month, frequency: daily indicators) The system will increase later (there will be more devices).
I’ve heard about nosql databases, but haven’t dealt with them yet, I’ve only worked with Mysql, but given the amount of data and the potential growth in needs and their complication, I think I’ll start studying, especially taking into account the obvious increase in the response time of the current database. I'm thinking about Redis or Mongo, but I'll be glad to know about other options if they are successfully applied here.
Essence of the question:
Which technology to choose?
How complex is the technology, what is the chance of making critical mistakes while learning and designing production at the same time?
What restrictions should I pay attention to so as not to get screwed up if I choose it on the long side?

Reply

Answer the question

In order to leave comments, you need to log in

5 answer(s)

R

Roman Mirilaczvili, 2018-06-30
@GilbertAmethyst

more than two thousand devices connect and transmit data at intervals of 5 to 40 seconds.

If we are only talking about periodically adding some of the same metrics (numerical values) over time, then you need to choose something from Time Series databases like InfluxDB, Prometheus, etc.
For IoT devices, you need to choose a DBMS based on the structure of the stored data , frequency of adding, method of extracting data.
For frequently updated data, you can take some fast K / V DBMS (NoSQL) like Tarantool, Aerospike or pop Redis. It is worth putting some small non-raw data there, since available RAM is used. It should be frequently used data. Often used for caches and queues.
I advise you to better study what raw data will be transferred, how it will be calculated / aggregated / processed and how often. Estimate approximate volumes for the near future and leave room for growth by an order of magnitude. Estimate the approximate amounts of net data stored based on the types of data being transferred so that you can roughly estimate the amount of storage.
Also consider using queued data processing systems.

V

Vladislav Kadun, 2018-06-30
@ZXZs

I think Redis is what it's made for.

D

Developer, 2018-06-30
@samodum

Redis is a very good option.
If the service is fast growing, then you need to provide horizontal scaling and then you will need to use Redis Cluster https://redis.io/topics/cluster-tutorial

M

m0nym, 2018-07-01
@m0nym

InfluxDB is a specialized DBMS for such data, if I understand your task correctly.
Or Tarantool - keeps everything in RAM, you can't imagine faster.
Or Aerospike - like Tarantool, but uses a disk, suitable if there is not enough RAM.

A

Alexander, 2018-07-01
@ushliy

Look, if you have data in the form of Time-Series metrics, something like monitoring, you should use the Prometheus or Influxdb described above. The second on large amounts of stored data is not very stable and rather gluttonous. But again, no one canceled data aggregation, reducing the frequency of stored points, i.e. after a month, aggregate data per second per minute. If there are a lot of writes, and reading is not so frequent, something like statistics, then you can use a clickhouse, it has a very impressive write speed, a good clustering capability, queries are similar to regular SQL. It is worth proceeding from the storage time, if the data will live conditionally for a day or two, then of course, you can use In-Memory databases like radish. Or, as mentioned above, aerospike. But just because it can dump to disk doesn't mean it's worth using,