T
T
TimurUfa2015-10-08 08:11:13
Xen
TimurUfa, 2015-10-08 08:11:13

Is it possible to set up an analogue of VMware Fault Tolerance on Xen?

So, the point is - there are 2 rack servers and 1 workstation. A virtual machine (Windows Server 2008 R2) is installed on the server, to which information is received from an industrial controller, it is necessary to assemble a hot backup system so that in the event of a hardware failure of server No. 1, server No. 2 receives control of the virtual machine without loss of RAM and the client part on The ARM didn't even notice the change (idle <= 5 seconds).
Options for using VMware Faulr Tolerance technology were worked out, but they ran into the need for a common balls for storing VM configuration files, the solutions came down to:
- hardware storage system (SHD), a very biting price tag, because you can’t buy a cheap option, it is storage in this situation that is the most vulnerable part of the system;
- a small server with Windows Server OS and setting up shared NFS balls as a storage system. As a result, the security of the system goes down the drain, and it’s not worth it to do so.
- virtual storage system VMware Storage Appliance, seems to solve the issue of common balls, but is no longer for sale.
As a result, even if the issue with storage is solved, the price tag of only VMware packages exceeds $16k, which is inappropriate.
At the moment, redundancy using the drbd-mirror technology on the Proxmox virtualization management system is being considered, due to its free of charge and the availability of developments. BUT this guarantees only High-Availability, ie. protection of data on hard drives, as a result, the loss of time is equal to the boot time of the VM, which is unacceptable.
In my last question, I was pushed in the direction of Xen, until a technology similar to Fault Tolerance was found, maybe someone came across and can direct me in the right direction? While I smoke manuals myself, but the project deadline is already burning down and I am looking for help wherever possible. Thank you in advance.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
L
lovecraft, 2015-10-17
@lovecraft

Fault Tolerance, of course, is a beautiful thing and, if I may say so, is the apotheosis of virtualization as a technology for "decoupling" the OS from hardware. But its effectiveness is exaggerated. For example, VMWare has a hard limit on Fault Tolerance - no more than one vCPU per VM ( https://pubs.vmware.com/vsphere-50/index.jsp?topic... ) , which makes Fault Tolerance just a beautiful toy. This all stems from the need to constantly keep the state of the two VMs in the hosts in sync, I don't know how VMWare does this, but Remus has a great document on this: https://www.usenix.org/legacy/event/nsdi08/tech/ fu...
In short, such a scheme is described there - the state of the VM is transferred to the backup host (a checkpoint is made), after which the entire I / O of the machine is accumulated in a special buffer until the next checkpoint. When the next checkpoint occurs, all I/O from the buffer is "released" to the outside and the cycle repeats again. Naturally, all this catastrophically reduces productivity.
Problems are also added by the fact that the guest OS is not aware that suspend / restart is constantly happening to it and may fall into a blue screen. At least, a special virtual timer (hv_time / hv_relaxed) was added to KVM at one time so that Windows would not crash under heavy loads. Separate conversation - paravirtual drivers. Until recently, there were no normal drivers for the original XEN, but there was a misunderstanding called GPLPV. Now drivers with support from the XEN team have appeared, but whether they were tested together with Remus is a big question.
If you analyze the threats for your project by the degree of decrease in their probability, then you get the following:
1) Failure due to a failure in the monitoring software
2) Blue screen due to working in a virtual environment or failure in paravirtual drivers
3) Blue screen "just because" - Windows sometimes also crashes)))
4) Hardware failure
Of all this, Remus can only protect against item 4.
Thus, in my opinion, the very fact of using "raw" XEN and "raw" PV -driver poses a much more likely threat to the OS than what could happen if the hardware fails. The finished XEN from Citrix and Oracle does not support Fault Tolerance, and I think there are reasons for that)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question