Tar

Last modified: Fri Nov 27 09:44:53 2020

You are allowed to print copies of this tutorial for your personal use, and link to this page, but you are not allowed to make electronic copies, or redistribute this tutorial in any form without permission.

Original version written in 1991 and published in the Sun Observer

How to use your system's tape drive (diskette drive?) to store files. How to do your own backups, and why they are important. Other uses tar.

Why bother with Backups?

As someone who has been an end user and a system manager, I strongly believe that every user should understand the importance of security. Not only in preventing others from accessing and modifying your data, but also in storing and recovering data you may have deleted on purpose or through accident.

If you have data that is important to you, you should
have a known backup.

Accidents and oversights happen. Tapes can be damaged, lost, or misslabeled. Assume your system administrator is top notch. The best administrator can recover your lost data 99% of the time. There is still a small chance that the files you need might not be recovered. Can you afford to duplicate months of effort 1 per cent of the time? No.

An experienced programmer learns to be pessimistic. Typically, this important fact is learned the hard way. Perhaps a few hours is lost. Perhaps days. Sometimes months are lost.

Here are some common situations:

A user works on a program all day. At the end of the day, the program is deleted by accident. The system manager cannot recover the file. A day's work has been lost.
A programmer tries to clean up a project directory. Instead of typing "rm *.o" the programmer types "rm * .o" and the entire directory is lost.
A user deletes a file by accident. After a few days, the user asks the system administrator to recover the file. The incremental backup system has re-used the only tape the missing file was on.
A large project is archived on a magnetic tape and deleted from the disk. A year later, some of the information is needed. The tape has a bad block at the beginning. The system manager must learn how to recover data from a bad tape. The attempt is often unsuccessful. The information is lost forever, and must be recreated, at the cost of months of effort.
Someone breaks into a computer and accesses confidential information.
A fire breaks out in the computer room. The disks and all of the backup tapes are lost.

Gulp! I scared myself. Excuse me for a few minutes while I load a tape....

Ah! I feel better now. As I was saying, being pessimistic has it's advantages.

Making a backup using tar

Making a backup is easy. Get a blank tape and learn how to load it onto the tape drive. Also learn how to write-protect the tape. Put a label on the tape. Then do the following:

cd
tar c .

The "cd" command moves you to your home directory. You could back up any directory the same way.

The tar command, which is an abbreviation of tape archive, copies the current directory, specified by the ".," to the default tape drive. The "c" argument specifies the "create" mode of tar.

You might get an error. Something about device "rmt8" off line. Don't worry. I exaggerated slightly when I said tar was easy to use. The tape device tar uses by default is "/dev/rmt8." There are several types of tape units, and not all can be referred to using that name. Some system administrators will link that name to the actual device, which makes tar easier to use. But if that doesn't work, you need to specify additional arguments to tar.

Syntax of the tar command

Most Unix commands follow a certain style when arguments are specified. Tar does not follow this convention, so you must be careful to use tar properly. If the standard was followed, then the following might be an example of dumping the current directory to the 1/2 inch tape cartridge, verbose mode, block size of 20:

tapedump -c -v -b 20 -f /dev/rmt8 .

Instead, all the flags are in the first argument, and the parameters to those flags follow the first argument, in order of the flags specified:

tar cvbf 20 /dev/rmt8 .

The same command can be specified in a different way by changing the order of the letters in the first argument:

tar cvfb /dev/rmt8 20 .

The only key letter that has a fixed location is the first one, which must specify if you are reading or writing an archive. The most common key letters, and the functions they perform are;

Key Letter	Function
c	Create an archive
x	eXtract an archive
t	Table of contents

Some versions of tar require a hyphen before the letter. It is optional on SunOS.

What is the name of the tape drive?

Part of the difficulty in using tar is figuring out which filename to use for which device. If you have a 1/2" tape drive, try

tar cf /dev/rmt8 .

If you has a 1/4" tape cartridge, try

tar cf /dev/rst8 .

If this doesn't work, then try changing the "8" to a "0." You can also list the devices in the /dev directory and look for one that has the most recent usage:

ls -lut /dev/r{m,s}t*

Some unix systems use different standards for naming magnetic tapes. There might be a "h" at the end of a name for high density. When in doubt, examine the major and minor numbers (using the ls -l command, and read the appropriate manual page, which can be found by searching through the possible entries using

man -k mt

and

man -k tape

More on tape names

The names of tape devices always start with a "r" by convention, which suggests this is a "raw" device that does not support a file system. If the first two letters are "nr" then this suggests a "no-rewind" operation. Normally the tape is automatically rewound when you are done. If you repeat the tar command, it will overwrite the first dump. As this can waste large amounts of tape, use the "nr" name of the tape to put several dumps on one tape. As an example, if you wanted to dump three separate directories to a 1/4 inch tape cartridge, you can type

cd dir1
tar cf /dev/nrst8 .
cd dir2
tar cf /dev/nrst8 .
cd dir3
tar cf /dev/rst8 .

Note that the third dump does not use the "no-rewind" name of the device, so that it will rewind when done.

To examine a tape without extracting any files, get a table of contents and use the key letter "t" or "tv" instead of the "c." The "v" flag gives a more verbose listing.

If you want to examine the third dump file, you can either use tar twice with the "no-rewind " names, or you can skip forward one or more dump files my using the mt (magnetic tape) command to skip forward 2:

mt -f /dev/nrst8 fsf 2

SunOS 4.1 has added a new convenience to the tar command. If you defined an environment variable TAPE:

setenv TAPE /dev/rst8

then you don't have to specify it for the mt or tar commands.

Using tar without a tape

Your personal workstation probably doesn't have a tape drive connected. This makes creating tar backup files slightly more complicated. If you have an account on a machine with a tape drive, and the directory is NFS mounted, you can just rlogin to the other machine and use tar to backup your directory.

If the directory is not NFS mounted, or it is mounted but you have permission problems accessing your own files, you can use tar, rsh and dd to solve this dilemma. The syntax is confusing, but if you forget, you can use

man tar

to refresh your memory. The command to dump the current directory to a tape in a remote machine called zephyrus, the command is:

tar cvfb - 20 . | rsh zephyrus dd of=/dev/rmt0 obs=20b

Here, the output file of tar is -, which tar interprets as standard input if tar is reading a tape, or standard output if tar is creating a tape.

the dd command does a data copy from standard input to the device /dev/rmt0.

This example assumes you can use rsh without requiring a password. You can add your current machine's name to the remote .rhost file if you get a password prompt when you use rlogin to access this machine.

If you do not have an account, you might be able to let a friend let you use their account temporarily.

Don't Do This!

I have to warn you about this because there is no secure way to do this. Whenever you allow someone else access to your account, they can create a program that lets them back into your account without your knowledge. This also holds true whenever someone lets you access their account. That is, if someone allows you to type in a command from their workstation or terminal, you may not be executing the programs you think you are. One of these programs might capture your password!

YOU HAVE BEEN WARNED!

In an effort to be complete, I will describe how to use a remote tape when your account has a different name on the remote machine. Let's assume your account on the remote machine is "sam." You could add a line to his .rhosts file with the following format:

machinename username

where these match your machine and user name. You could then use the command:

tar cvfb - 20 . | rsh zephyrus -l sam dd of=/dev/rmt0 obs=20b

to access the remote account.

The alert reader will realize that this is the same procedure used to allow someone else to have access to a remote account. I must stress that allowing someone else access to your account temporarily gives them the chance to access your account forever. I know of no simple, yet secure way to solve this problem. (I would like electronic mail if you think you know a solution that doesn't require writing a special purpose program.)

Using tar to create disk files

Because of Unix's device independence, you can send a tar archive to a pipe or to a Unix file. Both commands create a file called project.tar:

tar cf project.tar project
tar cf - project >project.tar

This is a convenient way to create an archive of a directory and move it to a different disk. This file does take up a lot of room. You can recover some of this room by running compress on the tar file:

compress project.tar

This creates a file called project.tar.Z and deletes project.tar. This is a convenient way to save disk space, because a compressed tar file or a directory is typically smaller that the original directory. Once the tar file is created, the original directory can be deleted. The directory can be recreated with these steps:

uncompress project.tar.Z
tar xf project.tar

Tar, by default, restores the original modification times. The m option suppresses this. The times of the last modification is important because make uses this information to keep track of file dependencies. This makes tar a convenient program to copy directories, unlike cp -r. To use tar to copy a directory, this example from the manual pages can be used:

tar cf - . | (cd todir;tar xfBp -)

Excluding/Including files when using tar

Tar will archive binary and source files. To speed up the archiving and reduce the archive size, most project directories have a special command in the make file that deletes all files that can be recreated. This is typically

make clean

that may just delete the executable file, object files and possibly a core dump:

# contents of makefile
clean:
/bin/rm -f *.o program core

Tar has a special flag that helps you make snapshots of a project directory. If you specify the flag F then this tells tar to not include the Source Code Control System (SCCS) directory. If you specify two F arguments, then this makes the restriction stronger, and files with the names core, a.out, errs, and any file ending with .o will not be written to the tar file. So, to create a backup of a project directory, the following command is used:

tar cvfFF project.tar project

This is a convenient way to give someone all the current files needed to build an executable, without giving them every SCCS version of every file.

This is convenient, but it may include files you don't want. Make creates files starting with a , to keep track of dependencies. Various editors create backup files ending with % or ~. I often keep the original copy of a program with the .orig extension, and old versions with a .old extension. There may be some binary files that you don't want to archive, but don't want to delete either.

The solution is to use the X flag to tar. This flag specifies that the matching argument to tar is a filename that lists files to exclude from the archive. Here is an example:

find project -type f -print |
egrep '/, | %$ | ~$ | .old$ | SCCS | /core$ | .o$ | .orig$' >Exclude
tar cvfX project.tar Exclude project

In this example, find lists all files in the directories, but does not print the directory names explicitly. If you have a directory name in an excluded list, it will also exclude all the files inside the directory. Egrep is then used as a filter to exclude certain files from the archive. Here, egrep is given several regular expressions to match certain files. This expression seems complex but is simple once you understand a few special characters:

The slash is not a special character. However, since no filename can contain a slash, it matches the beginning of a filename, as output by the find command. The vertical bar separates each regular expression. The dollar sign is one of the two regular expression "anchors," and specifies the end of the line, or filename in this case. The other anchor, which specifies the beginning of the line, is "^." But because we are matching filenames output by find, the only filenames that can match "^" are those in the top directory. Normally the dot matches any character in a regular expression. Here, we want to match the actual character ".," which is why the backslash is used to "quote" or "escape" the normal meaning.

A breakdown of the patterns and examples of the files that match these patterns are given below:

Pattern	Matches files	Used by
/,	starting with comma	make dependency files
%$	ending with %	Textedit backup files
~$	ending with ~	emacs backup files
.old$	ending with .old	old copies
SCCS	in SCCS directory	Source Code Control System
/core$	with name of "core"	core dump
.o$	ending with .o	Object files
.orig$	ending with .orig	programmer to show original version

Instead of specifying which files are to be excluded, you can specify which files to archive using the -I option. As with the exclude flag, specifying directories tell tar to include (or exclude) entire directories. You should also note that the syntax of the -I option is different that the typical tar flag. This example dumps all C and make files;

find project -type f -print |
egrep '.c | .h | Makefile$'>Include
tar cvf project.tar -I Include

I suggest using find to create the include or exclude file. You can edit it afterwards, if you wish. Extra spaces at the end of any lines will cause that file to be ignored.

Another way to debug the output of the find command is to use /dev/null as the output file:

tar cvfX /dev/null Exclude project

Including other directories

There are times when you want to make an archive of several directories. You may want to archive a source directory and another directory like /usr/local. The natural. but wrong way to do this is to use the command:

tar cvf /dev/rmt8 project /usr/local

When using tar, You must never specify a directory name starting with a "/." This will cause problems when you restore a directory, as you will see later. The proper way to do this is to use the -C flag:

tar cvf /dev/rmt8 project -C /usr local

This will archive /usr/local/... as "local/"

Restoring files with tar

There are several ways to specify a directory to archive. If the directory is under the current directory, I have already given the example:

tar c project

A similar way to specify the same directory is

tar c ./project

If you are currently in the directory you want archived, you can type:

tar c .

Another way to archive the current directory is to type

tar c *

Here, the shell expands the * character to the files in the current directory. However, it does not match files starting with a ., which is why the previous technique is preferred.

This causes a problem when restoring a directory from a tar archive. You may not know if an archive was created using . or the directory name.

I always check the names of the files before restoring an archive:

tar t

If the archive loads the files into the current directory, I create a new directory, change to it, and extract the files.

If the archive restores the directory by name, then I restore the files into the current directory.

Restoring a few files

If you want to restore a single file, get the pathname of the file as tar knows it, using the t flag. You must specify the exact filename, because filename and ./filename are not the same. You can combine this in one command by using:

tar xvf /dev/rst0 `tar tf /dev/rst0 | grep filename`

Whenever you use tar to restore a directory, you must always specify some filename. If none is specified, no files are restored.

There is still the problem of restoring a directory whose pathname starts with /. Because tar restores a file to the pathname specified in the archive, you cannot change where the file will be restored. The danger is that you may either overwrite some existing files, or you will not be able to restore the files because you don't have permissions.

You can ask the system administrator to rename a directory and temporarily create a symbolic link pointing to a directory where you can restore the files. Other solutions exist, including editing the tar archive, and creating a new directory structure with a C program executing the chroot(2) system call. Another solution is to get the Free Software Foundation's version of tar that allows you to remap pathnames starting with /. It also allows you to create archives that are too large for a single tape, incremental archives, and a dozen other advantages.

But the best solution is to never create an archive of a directory that starts with / or ~.

Remote Restoring

To restore a directory from a remote host, use the following command:

rsh -n host dd if=/dev/rst0 bs=20b |
tar xvBfb - 20 filenames

Because of it's nature, it is difficult to read fixed size blocks over a network. This is why tar uses the B flag to force it to read from the pipe until a block is completely filled.

This document was translated by troff2html v0.21 on September 22, 2001.