Broken master-master replication and recovery

S

s1dney2012-11-14 11:55:23

MySQL

s1dney, 2012-11-14 11:55:23

There are several bases with master<->master replication, while only one of the masters works for writing, and the second one is turned on only if the first one is down.
Periodically, replication breaks down and on one of the servers I get this:
Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file'
The log file is lost somewhere, I don’t understand the reasons.

If some records have already been made on the master master, then they were not replicated, and just starting the slave is not an option, you still need to somehow compare the bases, make them identical, and only then start the replica. The meaning of all this is clear, but how to compare bases and how to apply changes?
And most importantly, how can you avoid a replication breakdown at all in order to avoid stopping the base?

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

S

s1dney, 2012-11-16
@s1dney

Here it is well written why and how: dev.kafol.net/2011/09/mysql-error-1236-client-requested.html
Just in case, I will quote here:

Causes:
Master server has crashed and the binlog cache has not been flushed to disk. Slave has recieved a new position, did not recieve data, and data gets lost in a crash (however it might have been written to table, but not in binlog).
Solution:
Use this CHANGE MASTER statement on the slave.
CHANGE MASTER TO MASTER_LOG_FILE=[NEXT FILE], MASTER_LOG_POS=4;
SLAVE START;
in my case that would be
CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000034', MASTER_LOG_POS=4;
SLAVE START;
I don't know why the master log position needs to be 4 for the new file.
What happens:
When the master server restarts it logs binary changes to a new binlog file, so that we minimize data loss by skipping to the next file (everything from the previous file was written already).
Prevention:
Add this line to my.cnf:
sync_binlog = 1
With this setting the master server flushes cache in the binlog after every write, so that in case of a crash you can lose one statement at most.