Answer the question
In order to leave comments, you need to log in
Which option to choose for organizing data storage across multiple organizations in MySQL?
Dear experts, help to solve the problem.
Given: 500 organizations, each has a client base of up to 300 people, clients of organizations make appointments with organizations, immediately paying for the service, we take the maximum, let's say 100 appointments per year and 100 payments per year, still organizations have small, insignificant tables data about goods in their warehouses, personnel, accounting data, etc.
It turns out that one organization has lines in the database (approximately):
• 300 records (maximum) for clients (ID, full name, phone, address, date of birth, gender, field-1, field-2, field-3, field- 4, field-5, comments);
• 300*100 = 30000 (maximum) entries in the receptions table (client ID, entry date, field-1, field-2, field-3, field-4, field-5, comments);
• 300*100 = 30000 (maximum) entries in the payment table (client ID, payment date, payment amount, field-1, field-2, field-3, field-4, field-5, comments);
• 100 records (per year) for the table of goods in stock (item ID, quantity in stock, description, field-1, field-2, field-3, comments);
• 100 entries (per year) in the personnel table (employee ID, full name, phone number, address, qualification, specialty, position, description, dismissed, date of employment, salary, field-1, field-2, field-3, field -4, field-5, comments);
• 300 entries (per year) in the table of paid salaries;
• Other small tables dedicated to a particular organization are possible.
The total is approximately 601,000 rows for only one organization, without taking into account the number of columns in the tables (I roughly indicated the fields in the tables).
For 500 organizations, you get 300,500,000 records in the table (again, without counting the number of columns).
Task: to organize storage and online access to data, as well as the isolation of organizations. The work will take place in a Windows application that will be installed in each organization separately, the application will connect to the DBMS located on a remote server.
I plan to use MySQL on a cloud extensible server. In those hosting support they said that I was not limited by the number of databases and that the capabilities of MySQL itself and the disk space of the server should be taken into account. If necessary, you can use additional similar servers.
I see 3 solutions to this problem:
1. Use a separate database for each organization. There will be a master database that will store the minimum information about organizations and their database IDs, as well as the login and password for the desired database. I'm afraid of resources. I have no idea how such a large number of databases will affect server performance, given that 80% of organizations can work in their database at the same time as the rest. But I really like this method, because of the ability to backup for each organization separately, without interfering with others. Restoring a backup will not roll back data from other organizations. Ease of access to data. I see some pluses, but I don’t know if the server and MySQL itself will pull.
2. Use one database which will contain 500 sets of tables (according to the number of organizations) described above. Those. there will be a Master table in which there will be data about the organization and a prefix in the database names is indicated, for example, the prefix "Org1_" for the first organization, then for this organization there will be a set of tables with the names Org1_Clients, Org1_Receptions, Org1_Payments, Org1_Warehouse, etc., all tables for organization Org1. And so 500 sets for each organization. The total will be 500 * 10 (approximate number of tables per organization) = 5000 tables in one database. Is not it too much? Will it pull? It’s also a pretty good method, but due to lack of experience I can’t assess the load and sufficiency of resources for such an option.
3. Use 10 (approximate number of tables per organization) tables that will store all information for all organizations. The master table will store data on Organizations (Organization ID, any other information, such as phone, address, etc.) and 10 tables that I described at the very beginning, only in each of them the Organization ID field is added. That is, for example, a table of records for customers would look like this (ID, ID_Organization, full name, phone, address, date of birth, gender, field-1, field-2, field-3, field-4, field-5, comments). And then each table will store data about all organizations at once. But it seems to me that some tables will be too overloaded, for example, tables of records for appointments and payments, each of them will have up to 300000 * 500 = 150000000 rows, a lot, won't you be stupid? I don't like this method that the tables will be too overloaded, it is difficult to make backups and restores. But this method has the ability to use keys with two fields (ID and Organization_ID) and then in one table the client ID for all organizations will start from 1, I don’t know if this is a bonus. You can also use schemas in this option, but I'm not sure if this will give sufficient data isolation between organizations.
Here is such a problem. How to be?
Answer the question
In order to leave comments, you need to log in
I just before you asked this question, only not so detailed.
I like option 3 because there is a single base. Well, if it is in the cloud, for example, AWS Aurora, then you only monitor the smooth operation, but AWS does everything for you. She makes copies, when problems are found, she solves them herself and instantly, distributes the load across different zones and also duplicates across zones, creates replicas, etc. All you need is to add additional resources like memory, processor power, etc. in time. You also need to keep track of requests, identify problems in time, and optimize either the code or the database.
But here's what worries me. Hypothetically, if you lose the database, then you lose the data of all clients at once. That is, Khan's business. If the base rises, then all customers will stand together with it and in the sales business, even 5 minutes can be critical. Another problem may be future. Suppose you want others to be able to create extensions and those extensions to install their own tables. One base can grow very strongly. In short, third-party developers are a problem. They can make the code without taking into account the client ID and get, for example, all records from the table. Vulnerabilities in code that was not written by you.
Therefore, I tend to method 1. But there are drawbacks.
There should be no difference between the resources occupied. The number of entries will still be the same. As for performance, the first option is better. Easier to browse databases than one huge table
300 million records is not that much. Doubt? Make tests.
In general, the answer to your question - do the tests.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question