How to properly organize maintenance and work with a large database?

A

Anton_habr_20202021-05-06 20:06:20

Database administration

Anton_habr_2020, 2021-05-06 20:06:20

I've run into a problem and I'm asking for advice on how to solve it.

Given:
1. Internet service working 24x7. Well, I would like it to work that way.
2. The database of this service, which is growing and occupies 100+ GB.

The database runs on a single server or on a small cluster and requires periodic maintenance
: the system is evolving
2. Creation of dumps

Until recently, to perform these tasks, the service stopped and all operations were carried out as accurately as possible at night for 10-20 minutes. However, the system is growing and the moment is approaching when even 1-2 hours for certain operations will be achievable, which is extremely unpleasant. Dumps can be done using replication, we’ll set it up, but it’s not clear with structure changes ...

Do you have experience of such work? How to make changes to the structure of a large table, for example, without stopping the entire service?

Reply

Answer the question

In order to leave comments, you need to log in

4 answer(s)

V

Vamp, 2021-05-06
@Anton_habr_2020

Regarding mysql, I know three options:
1. The pt-online-schema-change utility . Creates an empty copy of the source table, makes an alter on it, copies the data from the source table, and finally swaps the old and new tables.
Used this tool a couple of times. Works well.
2. In mysql 5.6, it became possible to do alter without blocking by means of the subd itself . You need to add a couple of new parameters to alter:

ALTER TABLE tbl_name ADD PRIMARY KEY (column), ALGORITHM=INPLACE, LOCK=NONE;

I haven't tried this option myself, so I can't comment.
3. The most complex option using two mysql instances connected by master-master replication:

We do alter on the second server
We are waiting for the second server to catch up with replication
Switching the service to the second base
We wait for some time, see if the application works normally with the new scheme, if there are any degradation or errors
Making an alter at first base
We are waiting for the replication catch-up
Returning service to first base

I use this option all the time. It looks simple, but in fact there are many nuances.
It is necessary to skip alter past replication:

SET sql_log_bin = 0;
ALTER TABLE tbl_name ...;

It is important not to forget about sql_log_bin = 0, otherwise alter will replicate to a neighboring server and lock the table there. And you can not switch the service until replication catches up.
If the table structure changes - a column is added / removed or their order is changed, it is necessary to make sure that the replication type is STATEMENT. Otherwise, replication will fall on the very first request in ROW format with something like this strange error:

Column 25 of table 'mydb.mytable' cannot be converted from type 'varchar(255)' to type 'bigint(20) unsigned'

And with STATEMENT, you need to make sure that the application does not lower the isolation level below REPEATABLE READ anywhere, otherwise it will receive an error:

Cannot execute statement: impossible to write to binary log since BINLOG_FORMAT = STATEMENT and at least one table uses a storage engine limited to row-based logging. InnoDB is limited to row-logging when transaction isolation level is READ COMMITTED or READ UNCOMMITTED.

J

Julia Bedrosova, 2021-05-06
@Bedrosova

You can do it seamlessly: write code so that the next version of the code temporarily supports both structures of a given table: the old and the new. For example, you have 1 table with orders, you decide to normalize it - split it into 2 related tables. You create 2 new tables with a new structure and rewrite the order class so that it works simultaneously with both structures: duplicates all the data: writes them both to 2 new tables and to the old one, searches for data both there and there by a common key, and so Further. Roll this update, watch the work. Then you write a migrator script that takes data from one table, converts it and transfers it to 2 others, and it works in the background - until it works. When all the data has moved to new tables, in the new version of the application code, you remove support for the old table.

V

Vitaly Karasik, 2021-05-07
@vitaly_il1

I 'll add my couple of cents to the correct answers and comments (@Vamp et al): clean old data that is no longer relevant
- "Dumping" - mysqldump with --single-transaction, there are more advanced methods
But in general - either take a DBA, or - it's usually cheaper - use managed service - AWS RDS if you are already in AWS , or https://aiven.io/ .
Because in addition to the questions that you have already asked, there are many more questions that need to be asked for production - for example, how to roll back to for half an hour or for a day?

C

ComodoHacker, 2021-05-07
@ComodoHacker

1. Making changes to the structure, because the system is developing

You need to study the tools that your DBMS provides for this, and use them correctly. In general, this is more the concern of the developer, and not the administrator.

2. Creating dumps

Clarify what you mean by "dumps" and for what purpose they are made. If you mean backing up data, then any modern DBMS provides tools for this that allow you to make backups without interrupting the work of users. That is, see the previous paragraph.