Updated: Feb 5
As the amount of enterprise data gradually increases and the data growth rate continues to accelerate, shortening the backup window has become a major concern for system administrators.
Protecting data online, improving the simplicity of data protection, reducing the cost of data protection, and improving application awareness in the process of data protection has gradually become the primary needs of customers. Snapshot backup technology has gradually become one of the effective methods to solve this problem.
Keywords: Data protection Snapshotsnapshot COW ROW
Introduction to snapshot concepts
The Storage Network Industry Association (SNIA) defines a snapshot as a fully usable copy of a specified data set, which contains a static image of the source data at the time of copying.
The snapshot can be a copy or copy of the data reproduction. For a file system, a file system snapshot is an instant copy of the file system. It contains all the information of the file system at the time the snapshot is generated, and it is also a complete and usable copy.
How the snapshot is implemented
One of the characteristics of snapshots is that they are fast, so they cannot be copied and backed up when they are acquired. Instead, both designs of full-copy snapshot and differential snapshot are used. Differential snapshots are divided into COW (copy-on-write snapshots) and ROW (write-on-write snapshots).
1、Mirror separation snapshot (belonging to full copy snapshot)
The mirror split snapshot technology must first create and maintain a complete physical mirror volume for the source data volume before the snapshot time point arrives: two copies of the same data are stored on a mirror pair composed of the source data volume and the mirror volume. If you need to back up the entire mirrored volume in a certain event, you need to stop the IO read and write application to the primary volume, then terminate the mirror relationship, split the mirror to obtain the complete mirror when the host stops IO.
This snapshot method is simple, first create a mirrored volume of the original volume, every time you write to disk, will write to the original volume and the snapshot volume at the same time, when the snapshot is started, the mirrored volume can be quickly detached, creating a snapshot volume. Then recreate a mirrored volume of the original volume and wait for the next snapshot.
It can be seen that the biggest disadvantage of this scheme is that it consumes disk space. Each snapshot needs to occupy the same space as the original volume, and two copies are written at the same time when writing data, which has a relatively large impact on write performance. The advantage is that snapshot generation and restoration are convenient, and data isolation is very good, and there is no mutual influence between the snapshot volume and the original volume.
Copy-on-write snapshots use pre-allocated snapshot space for snapshot creation. After the snapshot time point, no physical data copy occurs, only the metadata of the physical location of the original data is copied. Therefore, the snapshot creation is very fast and can be completed instantly. Then, the snapshot copy tracks the data changes of the original volume (that is, the original volume write operation). Once the original volume data block is updated for the first time, the original volume data block is the first readout and written to the snapshot volume, and then the original volume is overwritten with the new data block. Copy-on-write, hence the name. But this method has an obvious shortcoming, that is, it will reduce the write performance of the original volume, so every time you write, you need to back up the original data to the snapshot area.
Write operations after the snapshot will be redirected, and all write IO will be redirected to the new volume. All old data is kept in the read-only source volume. The advantage of this is that the snapshot files generated each time are placed in a continuous storage area, and it also solves the performance problem of COW writing twice.
After taking multiple snapshots of a disk, a snapshot chain will be generated, and the virtual machine volume is always mounted at the end of the snapshot chain. For example, if a total of 10 snapshots are saved, when the snapshot is restored, to restore the latest backup point, 10 snapshot files are required to restore together. It can be seen that the main disadvantage of ROW is that it does not have a complete snapshot volume. If there are more snapshot levels, the system overhead during snapshot recovery will be larger.