Answer the question
In order to leave comments, you need to log in
How to reduce backup (90% - distribution files)?
If we make a backup of some kind of virtual machine, most of the backup will be standard files, like / bin / ls, etc. Identical on millions of systems (and even within a company they are the same on many machines).
The decision arises - to minimize archives. We look at each file, take its hash, and somehow check it centrally. If this hash occurs many times, we simply delete this file from the archive (marking that there was a file with such and such a hash in that place). When unpacking, we fill these hashes with real files (for example, downloading them from the service by hash, or, for example, downloading .deb where there is a file with this hash).
Is there any software or service for this?
PS
Yes, sometimes you can get by with incremental backups to partially solve this problem, or use LXC overlayfs for virtual machines. But the decision at the level of archives interests.
update :
made my bike - hashget utility for simple deduplication.
Article on Habré: Reduce backups by 99.5% with hashget
Answer the question
In order to leave comments, you need to log in
Deduplication in an archive storage system, or an archiving system with deduplication. For example, I have a couple of dozen VHD test windows of virtual machines lying on a small SSD. And they work smartly because SSDs and fit on a disk that is much smaller than the sum of the sizes of these VHDs.
But I'm interested in the solution at the level of archivesAn example of an archiver - zpaq In addition to the actual banal compression - deduplication, support for remote archives.
If this hash occurs many times, we simply delete this file from the archive (marking that there was a file with such and such a hash in that place).What you described is called file deduplication. A thing known for a long time, but ineffective and not needed by anyone.
Full-diff-inc at the backup level. But this does not take into account other backups. Deduplication will help you here, but you need to be careful with it.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question