Answer the question
In order to leave comments, you need to log in
Creating and working with big data, how to work with a large amount of data, with the possibility of instant access?
Novice programmers - the 4th year of the university, the idea came up to create a system in which the server records messages from users, indicators, some information, etc. And we are faced with a plethora of questions, how do we create a normal server system, how to upload and store data to backup locations, and much more, but one of the most problematic was that we have no idea how to make our system able to take in a colossal amount of data, it recorded it quickly enough and gave relatively fast answers both on demand and automatically. Who can help with at least some advice, thank you in advance. (The system will start with 10-20K users and will expand)
Answer the question
In order to leave comments, you need to log in
The question is very general, for such students the 4th should know the theoretical answer, or google articles describing solutions without problems. It is better to come here with a more specific problem.
I have already written many times and will repeat that for highly loaded systems, whether it is a load in terms of data volumes or intensity of requests, there are no ready-made recipes, the solution is always developed individually for the nuances of the project. To be able to develop such solutions, you must first gain rich experience in developing and operating simpler solutions. Therefore, for students in the vast majority of cases, this task is unbearable. In addition, if the data is really huge, then huge investments will be needed, at least it will be necessary to build a data center.
If you just want to get theoretical knowledge and develop a concept not for real use, then you should start by reading Kleppman and articles about the design of systems in Google, Yandex, Facebook, VK and others.
Colossal how much? Look not by size in bytes, but by the number of write and read events.
The main question is which read requests will be made, and how often. This is to define the list of fields to be indexed and how. It is these fields that determine the write speed, since indexes need to be updated for them, the rest of the data can ultimately be written at the maximum speed of the storage device.
Of course, the question is in parallel access to this data, the question of maintenance, backup, and so on.
In the vast majority of cases, a simple sql database is more than adequate, and the lack of speed is compensated by sharding and replication
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question