Home » Linux Basics » 08 - System Administration Tasks
8

Archiving and compressing files

Knowing how to archive and compress files

Knowing how to Archive and compress files is integral to Linux users. Archives have a wide variety of uses. Packages downloaded from the Internet are often in the form of an archive; these require decompression prior to installation and use. Similarly, any application or program should be packed before distribution. Backups may also be performed by packing and compressing hard disk contents. The tar command is used to archive files and also to unpack archives. The 'c' option is used to create an archive while the 'x' option is used to unpack or extract an archive. The tar command has the following general syntax:

    # tar cvf backup.tar /home

The above command packs all of the files in /home into the tar archive backup.tar. The first argument to tar consist of the options. The 'c' stands for new archive creation, 'v' stands for verbose mode; each file added to the archive is displayed as it is added; so are associated error messages. The 'f' option indicates that the name of the file into which the archive should be written follows. The last argument consists of the file or path whose contents should be archived.

    # tar xvf backup.tar

The above command will extract the tar file backup.tar and place the unpacked information in the current directory. The content of old files with the same name are overwritten when files are extracted into an existing directory. In order to avoid overwriting files or loosing other information, the structure of the tree that will result from unpacking an archive should be ascertained before unpacking. The following are two ways of packing and unpacking the contents of the path /home/LinuxUser, /home/LinuxUser1, and /home/LinuxUser2

    Issued from path / :
# tar cvf backup.tar /home/LinuxUser /home/LinuxUser1 /home/LinuxUser2
 
Extraction:
# cd /
# tar xvf backup.tar
 
Issued from path /home :
# tar cvf backup.tar LinuxUser LinuxUser1 LinuxUser2
 
Extraction:
# cd /home

# tar xvf backup.tar

Note that the command to unpack should be issued from the appropriate directory to ensure the integrity of the structure of the directory. If the files were archived at the root, they should also be unpacked at the root as the internal files are saved under the path from the root. The following command may be used to display a listing of the archive's files without extraction. This command is a good way of verifying the directory structure of the files to be unpacked.

    # tar tvf backup.tar

Unlike archiving programs for Windows, tar does not automatically compress files. Two files of size 2 MB each will create an archive whose size is 4MB. The size of an archive is the sum of the size of every file in the archive. The gzip utility should be used to archive and compress. The gzip command may be used to compress any file (not just a tar archive)

    # gzip -9 backup.tar

The above command compresses backup.tar and leaves you with backup.tar.gz, a compressed version of the file. Numbers 0-9 in the option area indicate the compression factor. Here, -9 makes gzip use the highest possible compression factor. The gunzip command may be used to decompress a gzip file. The resultant file will have the name 'backup.tar.gz'. Either the gunzip command or the gzip command issued with the 'd' option may be used uncompresses files (gzip -d <file_name>)

    # gunzip backup.tar.gz
 
is Equivalent to
 
# gzip -d backup.tar.gz

Old versions of Linux used the compress command to compress files. However, the compress command is no longer widely used due to licensing problems related to its compression algorithm. Also, gzip compresses far more efficiently. The file output by compress end with '.Z'. Gzip can uncompress '.Z' files also.

The following commands will archive a group of files and compress the result into backup.tar.gz.

    # tar cvf backup.tar /home
# gzip -9 backup.tar

The reverse commands will uncompress and unpack the archive created above:

    # gunzip backup.tar.gz
# tar xvf backup.tar 

The following shortcut demonstrates how both archiving and compression may be performed in a single step by sending the tar command's output to standard output using the hyphen (-) operator. This output is piped to the gzip command's input. The c option tells gzip to take input from standard input. The gzip command's output is redirected to backup.tar.gz

    # tar cvf - /home | gzip -9c > backup.tar.gz

A single command (already briefly covered in chapter 4) to unpack the above archive follows. Here, the gunzip command's output is piped to the tar command's standard input through the '-' operator.

    # gunzip -c backup.tar.gz | tar xvf -

Often, the gzip command is used without any options; in this case, it is simpler to simply archive and compress using the tar command's 'z' option:

    #tar cvfz backup.tar.gz /home