How to store large JSON arrays that are constantly updated (API)?

D

Daniil Sidorov2021-09-17 13:04:05

Software design

Daniil Sidorov, 2021-09-17 13:04:05

I am developing a service that works with the Wildberries and Ozon APIs. The functionality is simple - I received the data, calculated it and displayed it on the page. But there is a problem - I do not quite understand how best to organize the storage of the received data.

I reasoned like this - making an API request every time the page is updated is inappropriate, besides, I will get to "Too many requests". This means that the data must be updated in the background, for example, every 15 minutes, so I will create a CRON job. Every 15 minutes I will receive JSON with 10,000+ entries. How best to store it? I'm not sure that it's worth writing all this data in MySQL, because each time you have to completely clear the table of old data and write new ones. Can then store in files?

Who had experience in developing such projects, please tell me how best to save large JSON arrays.

Reply

Answer the question

In order to leave comments, you need to log in

4 answer(s)

D

d-stream, 2021-09-17
@DaniLaFokc

Large amounts of data, especially for which something then needs to be searched / aggregated, should be stored in a database.
Well, for database operations, there are not only insert operations, but also update operations.
Well, in fact, it’s more correct in this case to talk not about storing json, but about storing data [obtained from json]

R

rPman, 2021-09-17
@rPman

The first question to ask is how the data is being used.
Is it read-only, or are changes possible, or are only those changes coming from ozone (ie the question of whether the data will change after being received from ozone)? is there a search and filtering of data? Is it multi-user access or a service for yourself? Is there a major overhaul expected in the future with more functionality, or is it a one-time code as part of an experiment?
And the data itself, in the format obtained from ozone, is needed in a compatible format (isn't it necessary to aggregate data from several requests?)?
Here, of course, it is recommended to use databases, but in some cases it may turn out to be an overkill, and it may be enough for you to store the received data directly as it is in files (without combining them), and when requested, just read them completely (you can immediately after receiving the data , make an index on them and put them side by side into a file too)
Working with files, in the case when you need a lot of data at once, can be faster and you won’t spend much time on development.
But as soon as this data starts to change, or the amount of data becomes disproportionately larger than the one-time requests, index files and the code for working with them will become more complicated and it will become easier to transfer all data to the sql database.
ps I perceive storage in files as using a nosql database, especially since it is very fast, and even if storing data in the mode of many files in one record
pps storing files in php format (var_export) and connecting them to include may be the fastest way of all possible, for readonly 'databases' (json or serialize slower one and a half to two times).
upd. I was told here that there is an even faster php serializer - igbinary and is included in the delivery of the same debian / ubuntu

S

Sergey Sokolov, 2021-09-17
@sergiks

NoSQL databases, such as MongoDB, are suitable for storing such json documents.

D

Developer, 2021-09-17
@samodum

Databases have long been invented to store large amounts of data.