Backups
Ensuring the safety of the data in the system
Frequent backups are the best way to ensure the safety of the data on a system. The system administrator has to choose between performing time consuming full system backups and less thorough but faster partial system backups. A full system backup creates a copy of everything on the hard drive; this can restore the system to the state it was in at the time of backup at a keystroke. Certain directories (/proc - the directory that represents transient information related to current processes, for instance) should not be copied even as part of a full system backup.
Partial system backups are faster. The administrator picks and chooses the data to be backed-up. Normally, user directories like the /home directory are backed up; contents of directories like /usr (commands and programs), /var (log files, e-mail, etc.) and so on may be easily recompiled and reinstalled and do not require frequent backups.
The data may be backed-up on zip drives, tape drives, CD ROMs, external drives hooked up by USB and etc.. Normally, a removable source or a separate disk drive is used to facilitate recovery. Automated daily backups are the norm for most systems running on Linux. Backups may be automated by using the cron daemon to run backup shell scripts. It is a good idea to keep a few days' worth of backups so that there are many recovery options.
Often, the tar command is used to perform an archive to removable media. Normally, removable sources like CDROMs, tape drives and so on correspond to device drivers in the /dev directory. To make a backup of a particular directory on the filesystem, simply pack its contents and direct output to the device that you wish to use. The following command will backup the /etc directory and contents onto a tape drive whose driver resides at /dev/qft0:
tar cvf /dev/qft0 /etc
The backup may be extracted by the following command:
tar xvf /dev/qft0
To backup the entire filesystem, the '/' path should be specified. Backups may take a long time and should be performed when the system is not busy. The best times to do it are during early morning on weekends.
rsync is a command line utility to synchronize files between two computers. Some administrators prefer to use rsync as a backup tool. Rsync will perform a backup to a removable drive (e.g.. an external USB hard drive). This drive should then be detached from the system and stored in a safe place. At least version 2.6.0 of the rsync utility should be used. Rsync has the following basic syntax:
rsync -a <source_path>/ <target_path>/
The above command copies the contents of source_path to target_path. However, if the tree corresponding to source_path already exists in target_path, the rsync algorithm is used to check for differences between the contents of source and destination paths; only changes are updated. This is a technique known as incremental backup - only changes are copied. Performing even full incremental backups using rsync is exponentially more efficient than copying the entire filesystem every day.
The --delete flag may be used with rsync to delete any files found in the target that are not present in the source. This ensures that the target is an exact copy of the source. However, rsync preserves files found in the target by default; thus, the -a option should be used to ensure that documents in the target that are not found in the source are deleted:
rsync -a --delete source/ target/