Failover server, how?

M

MDXL2012-02-15 15:14:38

Data synchronization

MDXL, 2012-02-15 15:14:38

Recently, Hetzner often begins to "play pranks". We thought about a backup server in another data center.
A complete clone of the server located at Hetzner is required, real-time synchronization is required between them, and as soon as the server in Hetzner falls, the backup server takes over everything.
Maybe someone faced this problem, what would you advise, is there a completely ready-made solution and what pitfalls can arise?

Reply

Answer the question

In order to leave comments, you need to log in

5 answer(s)

V

Vlad Zhivotnev, 2012-02-15
@inkvizitor68sl

If you keep a server in hetzner, then you can pass by with such questions. If you have a budget of about 20-30k per month for all this, then keep thinking.

S

shadowalone, 2012-02-15
@shadowalone

May I know what it means to "be naughty"? And then, here, I have 7 servers there, and I, for some reason, do not notice pranks.
On the case: synchronization of what exactly in real time? files? DB? and both? something else?
how do you plan to switch from one server to another, what mechanism?
Or do you not know yet?

D

DmitryGushin, 2012-02-15
@DmitryGushin

There is no ready solution. What you want is quite expensive and gemorno. Unless, of course, you want “as soon as the server crashes, the backup server takes over everything”, and not “the server crashes, theoretically after 20 minutes, but in fact in an hour it starts to work somehow from the backup”.
What you need:
1) a good channel between DCs. Otherwise, it will not work to have an up-to-date database (blocked on the main server, recorded on the backup and received confirmation, unlocked on the main one), or it will work very slowly.
2) a single address space (hence, BGP) - without this, it will not be possible to raise 1 IP on different servers. With a single address space, options are already appearing on how to steer the switch - on the router (if it can) or through something like keepalived (again, the requirements for the channel between DCs, although softer than in paragraph 1).
All other options (DNS, for example) definitely lead you to one or another downtime. Moreover, for some users, this time can be very long and independent of your efforts (there are still crooked local CSN tuners among providers).

@

@bakset, 2012-02-15
_

Use the Cloud, it is for single failures and is designed to protect.
There are also Cloud solutions that are distributed over several DCs, these are already disaster-proof solutions.

Y

yakubovsky, 2012-02-15
@yakubovsky

Set up dns. Specify two dns in the domain. The first ns on the main server, the second on the reserve. The smallest ttl for records.
On the reserve, we give the main A record via dns. It is worth monitoring the availability of the main server. Once a minute, we ping the address, connect to port 80, or get the project page and look for a checkpoint on it.
One of the checks did not work - on the reserve, the monitoring script replaces the zone file with an A record on the reserve and relodim dns.
Here lies the danger. TTL for good will expire and the address should be asked again, but as they said above, the crooked “caches” of dns answers have not been transferred.
Synchronization of files by incron via rsync.
Database sync? Replication is a master-master or, if it is possible to cut the functionality, then the read-only database is on the reserve.