Full-file incremental forever, block-level incremental forever or source depulication? The best way to choose is to perform backup and recovery tests and evaluate the performance of each method.
A traditional backup starts with an initial full backup and is followed by a series of incremental or cumulative incremental backups (also known as differential backups). After some period of time, you will perform another full backup and more incremental backups. However, the advent of disk-based backup systems has given rise to the concept of the incremental forever approach, where only one backup is performed followed by a series of incremental backups. Let’s take a look at the different ways to do this.
File-level incremental forever
The first type of incremental forever backup is a file-level incremental forever backup product. This type of approach has actually been around for quite some time, with early versions of it available in the ‘90s. The reason why this is called a file-level incremental is that the decision to backup an item happens at the file level. If anything within a file changes, it will change its modification date (or archive bit in Windows), and the entire file will be backed up. Even if only one byte of data was changed within the file, the entire file will be included in the backup.
In order to be called a file-level incremental forever backup product, a backup product needs to only perform one full backup and follow that with a series of incremental backups. It will never again perform a full backup. This means that the backup product in question must be created this way from scratch.
There are some commercial backup offerings that are marketing themselves as incremental forever backup products because they support the concept of a synthetic full backup, which is created from other backups. If a backup solution still relies on regular full backups – even if they are created synthetically – it does not qualify to be called a file-level incremental forever backup solution.
The reason for this distinction is that there are advantages to never again performing a full backup that go beyond the reduction in processing and network traffic on the backup client that a synthetic full backup provides. Never again doing a full backup also reduces the amount of data that must be stored in the backup system, as well as data that might be copied to other storage, including the cloud. Starting with an incremental forever approach is also a great beginning toward performing deduplication. Even synthetic full backups have to be deduplicated, which is a waste of computing power.
This system is more efficient than the traditional full and incremental method. The biggest advantage is that there is no wasted CPU processing, network, or storage taken up with additional full backups. This also makes backups take less time.
This method is, however, incompatible with tape, since the biggest issue with tapes is incremental backups. Another advantage is that by design, the system knows exactly which versions of which files need to be restored in a full restore, and can restore just those files. This means there will not be the wasted time and effort that happens during a traditional restore that restores the entire full, and the entire contents of each incremental backup – not just the files needed from that backup.
Block-level incremental forever
Another incremental forever backup approach is block-level incremental forever. This method is similar to the previous method in that it will perform one full backup and a series of incremental backups – and will never again perform a full backup.
In a block-level incremental backup approach, the decision to back up something will happen at the bit or block level. (This probably should be called bit-level incremental backup, but no one calls it that.) In order for this approach to work, some application needs to maintain a bitmap of the data and which parts are changing – usually referred to as changed block tracking (CBT). In virtualization environments, this is typically provided by the hypervisor, such as VMware or Hyper-V. When it’s time to take the next backup, the backup software will ask for the bitmap of the blocks that have changed since the last incremental backup. It will then be provided with an exact map of which blocks need to be included in the latest block-level incremental backup. Such a backup solution must also keep track of the location of each block once they have been backed up, as it will need this information when it performs a restore. Although not exclusively found within virtualization environments, that is where you tend to see such a solution.
Block-level incremental forever backups significantly reduce the amount of data that must be transferred from the backup client to the backup server, so they can be very useful in backing up remote systems. Some backup solutions designed for laptops and remote offices use an incremental forever backup approach. The challenge to this approach is that something has to provide the CBT process, and not every system is able to do that. This type of backup process will only work with disk as the target, as an individual file could be spread out across multiple tapes if you stored it on tape – and restores would take forever. But the random-access nature of disk makes it perfect for this type of backup.
Source deduplication
The final type of incremental forever backup is called source deduplication backup software, which performs the deduplication process at the very beginning of the backup (i.e. the source). It will make the decision at the backup client as to whether or not to transfer a new chunk of data to the backup system.
Source deduplication systems are also an incremental forever backup approach. By design, they never again back up chunks of data the system has seen before. (A chunk is a collection of bytes of an arbitrary size created by slicing up files or larger images for the purposes of deduplication) In fact, they can back up less data than a block-level incremental forever backup solution.
Since a source deduplication system will reduce the amount of data backed up from a backup client to the backup solution more than any other approach, it is even more effective than block-level incremental backup at backing up remote systems such as laptops, mobile devices, remote offices, or VMs running in the public cloud. Most of the backup solutions designed for backing up laptops and remote sites to a central location use source deduplication. The biggest disadvantage to the source deduplication approach is that you might have to change your backup software to start using it. Some of the major backup products have added support for source-deduplication, but not all have done so.
The full-file incremental forever approach is no longer common. The block-level incremental approach is probably the most common and comes with uses less compute resources than source deduplication; however, it can only be used where CBT is available. Source dedupe works more universally, but it is not as common. The best thing to do is perform backup and recovery tests to see what kind of performance you get from each method.