Csh

Table of Contents

Copyright 2001 Bruce Barnett and General Electric Company

All rights reserved

You are allowed to print copies of this tutorial for your personal use, and link to this page, but you are not allowed to make electronic copies, or redistribute this tutorial in any form without permission.

This section describes C shell programming. It covers conditional testing, control loops, and other advanced techniques.
This month begins a tutorial on the bad-boy of UNIX, lowest of the low, the shell of last resort. Yes, I am talking about the C shell. FAQ's flame it. Experts have criticized it. Unfortunately, this puts UNIX novices in an awkward situation. Many people are given the C shell as their default shell. They aren't familiar with it, but they have to learn enough to customize their environment. They need help, but get criticized every time they ask a question. Imagine the following conversation, initiated by a posting on USENET:

Novice: How do I do XYZ using the C shell?

Expert: You shouldn't use the C shell. Use the Bourne shell.

Novice: I try to, but I get syntax errors.

Expert: That's because you are using the C shell. Use the Bourne shell.

Novice: I've now using the Bourne shell. How to I create aliases and do command-line editing in the Bourne shell?

Expert: You can't. use bash, ksh or tcsh.

Novice: I don't have these shells on all of the systems I use. What can I use?

Expert: In that case, use the C shell.

Novice: But you told me I shouldn't use the C shell!?!

Expert: Well, if you have to, you can use the C shell. It's fine for interactive sessions. But you shouldn't use it for scripts.

Novice: It's really confusing trying to learn two shells at once. I don't know either shell very well, and I'm trying to learn JUST enough to customize my environment. I'd rather just learn one shell at a time.

Expert: Well, it's your funeral.

Novice: How do I do XYZ using the C shell?

Another Expert: You shouldn't be using the C shell. Use the Bourne shell.

Novice: @#%&!

C shell problems

The C shell does have problems. (See My top 10 reasons not to use the C shell.) Some can be fixed. Others cannot. Some are unimportant now, but later on might cause grief. I'll mention these problems. But I'll let you decide if you want to continue to use the C shell, or start using the Bourne shell. Switching shells is difficult, and some may wish to do so gradually. If you want to use the C shell, that's fine. I'll show you the pitfalls, so you can intelligently decide. No pressure. You can switch at any time. But be aware that the C shell is seductive. It does have some advantages over the Bourne shell. But sometimes what seems like an advantage turns into a disadvantage later. Let me discuss them in detail.

Quoting long strings, $ and !

The first problem I faced with the C shell involved another language. I had a problem that required a sed or awk script. The C shell has a "feature" that warns programmers if they forgot to terminate a quoted string. The following command will generate a warning in the C shell:

echo "This quote doesn't end

The Bourne shell would continue till the end of the script. This is a good feature for an interactive shell, as it warns you if you forgot to close a quote. But if you want to include a multi-line string, such as an awk script inside a C shell script, you will have problems. You can place a backslash at the end of each line, but this is error prone, and also hard to read. Some awk constructs require a backslash at the end. Using them inside a C shell script would require two backslashes in a row.

There are some other strange quoting problems. Like the Bourne shell, the C shell has three ways to quote characters. You can use the single quote, double quote and backslash. But combine them, and find some strange combinations. You can put anything inside single quotes in a Bourne shell script, except single quotes. The C shell won't let you do that. You can type

echo hi!

but the following generates an error:

echo 'hi!'

You have to use a backslash if you want to do this:

echo 'hi\!'

Also, the following works:

echo money$

but this generates an error:

echo "money$"

But in this case you cannot even use a backslash. The followings an error in the C shell:

echo "money\$"

Unix shells have many special characters, and quoting them marks them as normal ASCII - telling the shell not to interpret them. And this is true with every Unix shell there is, except the C shell. In the above cases, putting quotes around some characters makes them special in the C shell, instead of preventing the special interpretation. Strange, huh?

The ad hoc parser

The second problem is subtle, but may be the next problem you discover. The Bourne shell has a true syntax parser: the lines are scanned, and broken up into pieces. Some pieces are commands. Other pieces are quoted strings. File redirection is handled the same way. Commands can be combined on one line, or span several lines. It doesn't matter. As an example, you can use

if true; then echo yes; fi

or

if true
then
echo yes
fi

The parsing of file redirection is independent of the particular command. If and while commands can get file redirection anywhere. The following is valid in the Bourne shell:

echo abc | while read a
do
echo a is $a
done >/tmp/f1

The same holds true for other Bourne shell commands. Once you learn the principles, the behavior is predictable.

The C shell does not have a true parser. Instead, the code executes one section for the if command, and another for the while command. What works for one command may not work for another. The if command above cannot be done in the C shell. There are two file redirections, and the C shell can't do either. Also, in the C shell, certain words must be the first word on the line. Therefore you might try something that works with one command, only to discover that it doesn't work on other commands. I've reported a lot of bugs to Sun, and to their credit, many have been fixed. Try the same code on other systems, however, and you might get syntax errors.

The parsing problem is also true with shell built-in commands. Combine them, and discover strange messages. Try the following C shell sequence:

time | echo

versus

time | /bin/echo

and notice the strange error message. There are other examples of this. These are the types of problems that sneak up on you when you don't expect them. The Bourne shell has the -n flag, which lets you check the script for syntax errors, including branches you didn't take. You can't do this with the C shell. The C shell seems to act on one line at a time and some syntax errors may not be discovered unless they get executed.

Reading one line at a time

Sometimes you have to ask a person for input in the middle of a script. Sometimes you have to read some information from a file. The Bourne shell allows you to specify the source of information for each command. Even though a script is connected to a pipe, you can ask the user for input. The C shell does not have this flexibility. It has a mechanism to get a line from standard input, but that is all it can do. You cannot have a C shell script get input from both a file and the terminal.

File redirection

With respect to file redirection, the Bourne shell has no limitations, while the C shell is very limited. With the Bourne shell, you can send standard error to one place, and standard out to another file. You can discard standard output, but keep the error. You can close any file descriptor, save current ones, and restore them. The C shell can't do any of these steps.

Signals, Traps and child processes

If you want to make your script more robust, you must add signal processing to it. That is, your script must terminate gracefully when it is aborted. The C shell has limited abilities. You can either do nothing, ignore all signals, or trap all signals. It's an all or nothing situation. The Bourne shell can trap particular signals, and call a special routine when the script exits normally. You can retain the process ID of a background process. This allows you to relay signals received to other processes under your control. The C shell cannot do this.

A time bomb

You can use the C shell for simple scripts. If you don't add many new features, and only write scripts for yourself, the C shell may be fine for you. But it is a time bomb. There are many times I wanted to add a new feature to a C shell script, and couldn't because it didn't support the idea. Or else I tried to port a C shell script to a different system and found that it didn't work the same way. Yes, you can use the C shell. Use it for as long as you want. Your choice.

Tick.... Tick... Tick...

Quoting C Shell Meta-Characters

This is my second tutorial on the C shell. This month, I will discuss quoting and meta-characters.

Like all shells, the C shell examines each line, and breaks it up into words. The first word is a command, and additional words are arguments for the command. The command

more *

uses a meta-character, the asterisk. The shell sees the asterisk, examines the current directory, and transforms the single character to all of the files in the current directory. The "more" program then displays all of the files. There are many other meta-characters. Some are very subtle. Consider this meta-character example:

more a b

The meta-character? It's the space. In this case, the space indicates the end of one filename and the start of the next filename. The space, tab, and new-line-character are used by the C shell to indicate the end of one argument, and the beginning of the next. (The Bourne shell allows more control, as any character can be specified to have this function).

These meta-characters are an integral part of UNIX. Or rather, an integral part of the shell. A meta-character is simply a character with a special meaning. The file system doesn't really care about meta-characters. You can have a filename that contains a space, or an asterisk, or any other character. Similarly, you can specify any meta-character as an argument to any command. Understanding which characters are meta-characters, what they do, and how to prevent them from being special characters is a skill that must be learned. Most learn by trial and error. Trouble is, the C shell is trickier than other shells.

One way to discover these characters is to use the echo built-in command, and see which characters the C shell will echo, and which ones are treated special. Here is the list of meta-characters, and a quick description of the special meaning.

+-----------------------------------------------------------------------+
|		    List of C Chell Meta-Characters			|
+-----------------------------------------------------------------------+
|Meta-character	  Meaning						|
+-----------------------------------------------------------------------+
|newline	  End of command					|
|space		  End of word						|
|tab		  End of word						|
|!		  History						|
|#		  Comment						|
|$		  Variable						|
|&		  End of command arguments, launch in background	|
|(		  Start sub-shell					|
|)		  End sub-shell						|
|{		  Start in-line expansion				|
|}		  End in-line expansion					|
||		  End of command arguments, Pipe into next command	|
|<		  Input Redirection					|
|>		  Output Redirection					|
|*		  Multi-character Filename expansion (a.k.a. globbing)	|
|?		  Single-character Filename expansion (a.k.a. globbing) |
|[		  Character Set Filename expansion (a.k.a. globbing)	|
|]		  Character Set Filename expansion (a.k.a. globbing)	|
|;		  End of command					|
|'		  Strong quoting					|
|"		  Weak quoting						|
|`		  Command substitution					|
|		  Sometimes Special					|
+-----------------------------------------------------------------------+

If you do not want one of these characters to be treated as a meta-character, you must quote it. Another term for this is escape, as in "escape the normal behavior." The Bourne shell has a predictable behavior for quoting meta-characters:

  1. Put a backslash before each character.
  2. Put single quotes around all of the characters.
  3. Put double quotes around all of the characters. Exceptions: the dollar sign ($) and back-quote (`) are special, but a back-slash before them will escape the special connotation.

    The Bourne shell has an internal flag that specifies when quoting occurs. It is either on or off. If quoting is on, then meta-characters are escaped, and treated as normal. The C shell is similar, yet not identical. As you will see, the quoting mechanism is less predictable. In fact, it has some maddening exceptions, Let me elaborate.

Using the backslash

If you want to use a meta-character as an ordinary character, place a backslash before it. To delete a file called "a b" (there is a space in the filename between a and b), type

rm a\ b

Strings in single quotation marks

The second method for quoting meta-characters is specifying a string that begins and ends with single quotes:

rm 'a b'

In the Bourne shell, any character in single quotes is not a meta-character. This is not true in the C shell. There are two exceptions: the exclamation point, and new line. The Bourne shell allows this:

echo 'Hello!'

The C shell requires a backslash:

echo 'Hello!'

The exclamation point is a meta-character, and the C shell uses it for its alias and history features, which I will discuss later. The other exception is a new-line character. The Bourne shell allows:

echo 'New line ->
'

The C shell requires a backspace before the end-of-line:

echo 'New line ->
'

A novice programmer may consider this a feature, as any command with an unterminated string will generate an error. However, when your programmer skills increase, and you want to include a multi-line awk script in a shell script, the C shell becomes awkward. Sometimes awk needs a backslash at the end of a line, so in the C shell, you would need two backslashes:

#!/bin/csh -f
awk '{printf("%st%st%sn",
$3, $2, $1}'

Click here to get file: CshAwk.csh
An awk script in a C shell script is extremely awkward, if you pardon my choice of words.

Strings in double quotation marks

The last quoting mechanism is a string that starts and ends with the double-quote marks:

rm "a b"

This is similar to the single-quote form, but it intentionally escapes most meta-characters except the dollar sign and back-quote. This allows command substitution, and variable interpretation:

echo "I am $USER"
echo "The current directory is `pwd`"

Like the single quoted string, the exclamation point is an exception:

echo "Hello!"

I usually call the single quote the "strong" quote, and the double quote the "weak" quote. However, it is not so simple. The frustrating thing about the C shell is that inside a double-quote string, you cannot place a backslash before a dollar sign or back-quote to escape the special meaning. That is, when I execute the command

echo "$HOME"

the shell echoes

/home/barnett

So the backslash only works some of the time. The C shell if filled with special cases. You have to learn them all. To make it easy for you, here is a table that explains all of the exceptions. The first column is the meta-character. The second column shows what is required to get the meta-character in a string delineated by double quotes. The third column corresponds to single quotes. The last column shows what is needed when there are no quotation marks.

+--------------------------------------------------------------+
|	   Meta-character interpretation in strings	       |
+--------------------------------------------------------------+
|Meta-character	  "..."	       '....'	    no quotation marks |
+--------------------------------------------------------------+
|newline	  Requires    Requires    Requires 	       |
|space		  Quoted       Quoted	    Requires 	       |
|tab		  Quoted       Quoted	    Requires 	       |
|!		  Requires    Requires    Requires 	       |
|#		  Quoted       Quoted	    Requires 	       |
|$		  Impossible   Quoted	    Requires 	       |
|&		  Quoted       Quoted	    Requires 	       |
|(		  Quoted       Quoted	    Requires 	       |
|)		  Quoted       Quoted	    Requires 	       |
|{		  Quoted       Quoted	    Requires 	       |
|}		  Quoted       Quoted	    Requires 	       |
||		  Quoted       Quoted	    Requires 	       |
|<		  Quoted       Quoted	    Requires 	       |
|>		  Quoted       Quoted	    Requires 	       |
|*		  Quoted       Quoted	    Requires 	       |
|?		  Quoted       Quoted	    Requires 	       |
|[		  Quoted       Quoted	    Requires 	       |
|]		  Quoted       Quoted	    Requires 	       |
|;		  Quoted       Quoted	    Requires 	       |
|'		  Quoted       Impossible   Requires 	       |
|"		  Impossible   Quoted	    Requires 	       |
|`		  Impossible   Quoted	    Requires 	       |
|		  Quoted       Quoted	    Requires 	       |
+--------------------------------------------------------------+
The phrase "Quoted" means the meta-character does not have a special meaning. The phrase "Impossible" means the meta-character always has a special meaning, and cannot be quoted. The phrase "Requires " says that a backslash is required to escape the special meaning.

To use the table, imagine you have a file with the same name as a meta-character. Suppose the filename was "?" (a question mark). You have three methods of specifying the exact filename when deleting it:

rm "?"
rm '?'
rm ?

If the file had the name "!" (an exclamation mark), then the table states you always need a backslash:

rm "!"
rm '!'
rm !

Notice that some combinations don't work. That is, there is no way to place a single quote inside a string delineated by single quotes. Here comes the tough question. Suppose you wanted to do the impossible. How do you solve this problem?

Solving special cases

Normally, having different quotes is convenient. You can use one form of quote to contains the other form:

echo " ' "
echo ' " '

How can you place a quote within a quoted string, when the quote is the same type? The simple answer? You can't. But there is a simple trick that can be used for all complex cases. But it requires a different view of quoted strings. You see, they are not really strings. Most programmers think a string is defined by the quotes at the beginning and at the end. That is, you place quotes around the string, and insert special characters in the middle to get around any tricky conditions. This does not accurately describe a UNIX shell. There is no concept of strings in the shell. Instead, the shell has an internal flag which can be enabled and disabled ny the quote characters. The following examples are all identical:

rm "a b"
rm a" "b
rm "a "b
rm a" b"
rm "a"" ""b"
rm "a"' '"b"

In some cases, the letters "a" and "b" are quoted. In other cases, they are not, because they do not need to be escaped. The space, on the other hand, is always quoted in each example above. The secret is understanding which characters have to be quoted, and selecting the best way to quote them.

Now suppose you want to include a double quote inside a double quote? You can't. But you can switch the types of quotation marks at any point. The last example switches the quotation marks from double quotes to single quotes. This same technique can be used to delete a file with a double quote in the filename. Here are fours ways to do this:

rm 'a"b'
rm a'"'b
rm a"b
rm "a"'"'"b"

Here are some other examples:

echo "The date command returns " '"' `date` '"'
echo 'Here is a single quote:' "'"

Passing variables inside a string

A common question is how to pass a variable to the middle of a string. Suppose you wanted to write an AWK script that printed out one column. To print the first column is easy:

awk '{print $1}'

But this script always prints the first column. To pass the column number to the script requires the same techniques:

awk '{print $'$1'}'

This is hard to read, and you need a certain knack to become accustomed to understanding it. Just scan from left to right, keeping track of the current quote state, and which character was used to enable the quote condition. After a while, it becomes easy.

C shell Globbing

This is my third tutorial on the C shell. This month, I will discuss filename expansion, and globbing.

One of the primary functions of a shell is to provide an easy way to execute commands, and passing several files to the command. Before the shell determines which command to execute, it evaluates variables, and performs filename substitutions. This is a fancy way of saying you can abbreviate filenames. The Bourne shell only supports globbing. The C shell supports three forms of filename substitutions: globbing, in-line expansion, and home directory expansion:

Globbing

The first UNIX shell was called the Mashey shell. This was before the C shell and Bourne shell was written. The Mashey shell didn't have filename substitution. If you wanted to list every file in a directory, the shell did not support

ls *

Instead, you had to use the glob command:

ls `glob *`

Glob is a precise, scientific term that is defined as "a whole messa," which is not to be confused with "glom" which means "view" or "examine." An example would be "I wanna glom a whole messa files." The proper terminology, used by those with a doctorate in computer science, is of course "I wanna glom a globba files." And anyone with a similar doctorate will know exactly what this means. Try this at your next dinner party, and you too can impress the neighbors.

Needless to say, after creating, editing, printing, and deleting globs of files day after day, someone realized that life would be easier if the shell did the globbing automatically. And lo, UNIX shells learned to glob.

The C shell has this feature of course. The easiest way to learn how globbing works is to use the echo command:

echo *

This will echo every file in the directory. Well, not every one. By convention, files that start with a dot are not echoed. If you want to list all files, use

echo * .*

In other words, the dot must be explicitly specified. The slash "/" must also be matched explicitly. Other utilities, like find, use this same convention. The asterisk matches everything except these two characters, and means "zero or more characters." Therefore

echo a* b* c*

will echo all files that start with an "a," "b," or "c." Note that the shell sorts the files in alphabetical order. But each expansion is sorted separately. If you executed

echo c* b* a*

the order would first be the files starting with c, then with b, then with a. Within each group, the names would be sorted. Also note that the shell puts all of the files on one line. The echo command can be used as a simple version of the ls command. If the ls command is given the above command, it will sort the filenames again, so they will be in alphabetical order.

If you want, you can enable or disable globbing. The command

set noglob

disables globbing, and

unset noglob

enables it again. These two echo statements do the same thing:

set noglob
echo a* b* c*
echo 'a* b* c*'

Both echo the string "a* b* c*" instead of expanding to match filenames.

What happens if you type the command

echo Z*

and you have no files in your directory that start with a "Z?" The Bourne shell will assume you know what you are doing, and echo "Z*" without any complaints. Therefore if you have a Bourne shell script, and execute

myscript Z*

then the script myscript will get the argument "Z*" instead of a list of filenames. The C shell gives you a choice. If none of the filename substitutions find a file, you will get an error that says:

No match

But if you set the "nonomatch" variable:

set nonomatch

then the C shell behaves like the Bourne Shell.

You should also note that the asterisk can be anywhere in a filename, and you can use any number of asterisks. To match files that have the letters a, b and c in sequence, use

echo *a*b*c*

Match a single character

The meta-character "?" matches a single character. Therefore

echo ???

will match all filenames that have three letters, while

echo ???*

will match files that have three or more letters.

Matching character sets

You can match combination of characters, using square brackets. If any of the characters inside the square brackets match the single character in the filename, the pattern with match. You can also specify a range of characters using a hyphen. Therefore the following are equivalent:

echo a* b* c*
echo [abc]*
echo [a-c]*

To match any file that starts with a letter, you can use any of the following:

echo [a-zA-Z]*
echo [abcdefghijklmnopqrstuvwxyzA-Z]*
echo [ABCDEFGHIJKLMNOPQRSTUVWXYZa-z]*
echo [zyxwvutsrqponmlkjihgfedcbaA-Z]*
echo [A-Zzyxwvutsrqponmlkjihgfedcba]*

As you can see, the order doesn't matter, unless a hyphen is used. If you specify a range in a reverse alphabetical order, the results are unpredictable. The command

echo [c-a]*

will only match files that start with "c" using the C shell, while the Bourne shell will match files that start with "c" or "a." Use improper ranges, and the different shells give different results. The command

echo [c-b-a]*

will only match files that start with a "c" with the C shell, while the Bourne shell will match files that start with a, b or c. Apparently the Bourne shell will treat strange range values as individual characters, while the C shell ignores bogus ranges, except for the starting character.

Combining meta-characters

You can combine these meta-characters in any order. Therefore it makes sense to pick filenames that are easy to match with meta-characters. If you had ten chapters in a book, you don't want to name them

Part1.bk Part2.bk ... Part10.bk

You see, if you specified the chapters like this:

ls Part?.bk Part10.bk

the shell would expand all of the meta-characters, and then pass this to the ls command, which would then change the order. Therefore after file "Part1.bk" would be "Part10.bk" followed by "Part2.bk." Instead, use the names

Part01.bk Part02.bk ...Part10.bk

so the alphabetical order is always the logical order. This can be very useful if you have log files, and use the current date as part of the file name.

It is also important to note that the shell evaluates variables before expanding filenames. So you can specify

echo $USER??.out

and the shell will first evaluate the variable "USER," (to "barnett" in this case) and then perform the filename substitution, matching files like "barnett12.out" and finally sorting the filenames in alphabetical order. Then the list of filenames is passed to the echo command.

In-line expansion

The C shell has a unique feature, in that it can generate patterns of filenames even if they don't exist. The curly braces are used, and commas separate the pattern. Suppose you have the file "b0" in the current directory. The command

echo b[012]

will only echo

b0

But the command

echo b{0,1,2,3,0}

will generate

b0 b1 b2 b3 b0

Notice that the order is not sorted alphabetically. Also note that "b0" occurs twice. If we change this to

echo [b]{0,1,2,3,0}

the output again becomes

b0

The in-line expansion comes first, and then the filename substitution occurs. You can put meta-characters inside the curly braces. The following two are equivalent:

echo b* b?
echo b{*,?}

The number of characters can change within the commas. These two commands are equivalent:

echo A1B A22B A333B
echo A{1,22,333}B

This in-line expansion has a multiplying effect. The command

echo {0,1,2,3,4,5,6,7,8,9}{0,1,2,3,4,5,6,7,8,9}

will print out 100 numbers, from 00 to 99. The Bourne shell does not have this feature.

Home directory expansion

The last for of filename expansion is the "~" character. By itself, it expands to the build-in variable "home." You can also provide a username. This allows you to specify a filename in someone's home directory, without knowing what it is:

more ~smith/.login

The C shell determines this value by examining the password file.

C Shell Variable Usage

An essential part of understanding the C shell is mastering variables and variable lists. Setting variables are simple:

set myvariable = word
set myvariable = "a string"

The second form is necessary if spaces are needed inside the variable. I always put spaces around the equals sign, as the C shell seems to complain less with extra spaces. To clear a variable, set the variable to an empty string:

set myvariable = ""

to remove a variable, so it is no longer defined, use the unset command

unset myvariable

Passing arguments to a shell script

If you create your own shell script, you can pass parameters similar to the Bourne shell. The C shell even uses the same convention of $1 and $*:

#!/bin/csh
echo First argument is $1, second argument is $2
echo All of the arguments are $*

Click here to get file: Csh1.csh
However, there is no $# variable, which returns the number of arguments in the Bourne shell.

How can you check the number of arguments? The C shell has a second way of specifying the arguments passed to the shell script, using a predefined variable list, called "argv" The C shell borrowed this concept from the C programming language. Wonder why? (Hint: it is called the "C" shell for a reason.) The value $argv[1] is equivalent to $1, and $argv[2] is the same as $2. Furthermore, $argv[*] is equivalent to $*. The variable $#argv contains the number of arguments passed to the shell:

#!/bin/csh
echo There are $#argv arguments
echo First argument is $argv[1], second argument is $argv[2]
echo All of the arguments are $argv[*]

Click here to get file: Csh2.csh
There is a subtle difference between $2 and $argv[2]. If there is only one parameter, the line

echo Second argument is $2

will print

Second argument is

while

echo Second argument is $argv[2]

will generate an error, and print

Subscript out of range

The C shell has a mechanism to prevent this error. It is also used to specify a subset of the list of values:

echo All of the arguments but the first are $argv[2-$#argv]

This does not generate an error. With typical UNIX terseness, the above can be replaced with either of the two following statements:

echo All of the arguments but the first are $argv[2-*]
echo All of the arguments but the first are $argv[2-]

This is a general purpose mechanism, as any value can be used to specify the range:

echo first through third arguments are $argv[1-3]

If you specify a specific range, the argument has to be there. That is, $argv[2] will generate an error in the second argument does not exist. $argv[2-] will not. If the first range is missing, the default is 1, or the first parameter.

Variables are allowed inside square brackets.

set first = 1
set last = $#argv
echo first argument is $argv[$first]
echo last argument is $argv[$last]
echo all of the arguments are $argv[$first-$last]

Arguments generalized as array lists

As you can see, the C shell allows you to easily specify portions of the argument list using ranges of values. This can be very convenient. Apparently the author of the C shell agreed, because any variable can be an array list. The simplest way to create an array list is to use the backquote characters:

set mydate = `date`
# returns a date, like
# Tue Jan 7 17:26:46 EST 1997
echo Month is $mydate[2], year is $mydate[$#mydate]

The Bourne shell has an array list, but only one, and it doesn't have a name. It is the argument list, which is set when the script is called. You can change this with the "set" command. You must, however, remember to save the current values if you want to use them again. So if you need to manage several arrays, and perhaps use one index to access several arrays, the C shell might make the task easier. Yes, I realize some consider my statement blasphemous. Today, I'm wearing my "Don't bug me because I'm using the C shell" pin, so I don't care. If you want to give me grief, read my "Curse of the C shell" column three months ago. I'm not going to sugar coat the C shell beast. It's covered with warts, but it does have some convenient features. Just watch out for the warts. Speaking of which...

The C shell does not elegantly handle variables that contain spaces. There is no equivalent to the $@ variable. Consider this:

set a = "1 2 3"
set b = $a

Any sane person might expect that variables "b" and "a" have the same value. Surprise! This doesn't work in the C shell. On some versions of SunOS, the variable "b" is empty. On other systems, you get "set: syntax error."

How can you copy an array? The command below should work, but doesn't:

set copy = $argv[*]

It behaves like the example above. Unpredictable. If you enclose the variable in double quotes, like this:

set copy = "$argv[*]"

then all of the parameters are in $copy[1], and $copy[2] is not defined. A slight improvement. The C shell has a special mechanism for specifying arrays, using parenthesis:

set copy = ( $argv[*] )

Therefore the following two statements are equivalent:

set mydate = `date`
set mydate = ( `date` )

The parenthesis is preferred for several reasons. The following commands do not work:

set c = $*
set c = $argv[*]
set c = a b
set c = "a" "b"
set c = `date` `date`
set c = `date` $a "b c"

Some do nothing. Some generate an error. Some cause a core dump. Life would be dull without the C shell. However, adding parenthesis solves almost all of the problems:

set c = ( $* )
set c = ( $argv[*] )
set c = ( a b )
set c = ( "a" "b" )
set c = ( `date` `date` )
set c = ( `date` $a "b c" )

This eliminates the unpredictable behavior. The only problem left is spaces inside variables. Since there is no $@ variable, the only way to retain spaces in an array is to copy each element over, one by one:

set b[1] = "$a[1]"
set b[2] = "$a[2]"
etc.

This, by the way, uses a special for of the "set" command, that can modify a single array element. However, the array must exist first, so the code to copy an array is complicated, and darn ugly too.

In general, the C shell is not suited for variables that contain spaces. Also, if there is any possibility that an argument contains a space, you should use

set a = ( $argv[1] )

or

set a = "$argv[1]"

instead of

set a = $argv[1]

Alternately, just tell anyone who uses spaces in arguments to perform an impossible biological act. I've learned this from my colleges, who've come to the conclusion that anyone who worries about spaces inside arguments is someone who has way too much free time. I don't know anyone like that, however.

Clearing array lists

The command:

set a = ""

Defines variable $a[1], and sets it to an empty string. $#a is equal to 1. However, the command

set a = ()

empties the array, so variable $a[1] does not exist. The variable $#a is equal to 0. The command

unset a

removes the definition of the variable.

Testing for variable existence

The Bourne shell lets you refer to variables that do not exist. If you ask for the value, you will get an empty string. The C shell will give you a warning if the variable does not exist, or the array element does not exist. Using $1 instead of $argv[1] can help, as well as using ranges, like $argv[1-]. There is another method, by using the special variable $?x, where x is the name of the variable. If the variable does not exist, it will have the value of zero. If the variable exists, it will have the value of one. You can combine this with an "if" statement, which is somewhat more cumbersome than the Bourne shell technique.

The Shift command

The special command "shift" can be used to remove the first array element. It "pops" the first value of the stack of the argument list. The "shift" command can take an optional variable:

set a = ( a b )
shift a
# same as
# set a = ( b )

If you listen carefully, you can hear a slight noise as each variable pops off the stack. I've suddenly realized I've been working too hard. I'll continue next month. Until then, take care.

C Shell Flow Control

We've been talking about variables, lists, and strings. Time to start doing something useful with the C shell. Let's start with a simple way to branch.

myprogram && echo The program worked

If the program "myprogram" has no errors, the script echoes "The program worked." In other words, if the program "myprogram" exits with a status of zero, the "echo" program is executed. If any errors occur, the program would exit with a positive value, which typically indicates the error that occurred. To test for an error, use "||" instead:

myprogram || echo the program failed

These can be combined:

myprogram && echo program passed | echo program failed

This can be used for many tests, but there are some points to watch out for. If the program "myprogram" generates any output, it will be sent to standard output, which may not be what you want. You can capture this by redirecting standard output:

myprogram >/dev/null && echo program passed

If the program might generate an error, you can capture this by using the special combination ">&." This merges standard error with standard output:

myprogram >& /dev/null && echo program passed

This type of conditional operation can be enclosed in parenthesis to keep standard input flowing through the different programs. For instance, if you wanted to test standard input for the word "MATCH," and either add the word "BEFORE" if the match is found, or add "AFTER" if no match is found, you can use the following ugly code:

#!/bin/csh
tee /tmp/file |
( grep MATCH /tmp/file >/dev/null &&
( echo BEFORE; cat - ) || ( cat - ; echo AFTER) )
/bin/rm /tmp/file

Click here to get file: prog.csh
If you save this script in a file called "prog" and type

echo MATCH | prog

the script will echo

BEFORE
MATCH

If instead you execute

echo no | prog

the script will output

no
AFTER

The parenthesis are necessary, because the "echo" command does not read standard input. It discards it. Putting parenthesis around the "echo" and "cat" commands connects the pipe to the input of both programs.

The C shell has some similarities to the Bourne shell. In fact, the above script could be a Bourne shell script. The two shells act the same in this case. However, early versions of the C shell had a bug. The meaning of "&&" and "||" were reversed. This was a long time ago, and I think most versions of the C shell have this bug fixed. Still, it may not be portable. The second difference between the shells is the Bourne shell allows the curly braces to be used as well as the parenthesis. The parenthesis causes an extra shell to be executed. So the script above requires 4 shells to be executed. The Bourne shell could do it with one shell process. Still, I find it amusing that the above script works for both the C shell and the Bourne shell. But I don't get out much.

The second mechanism for doing tests in the C shell uses the "if" command. There are two formats, a short form and a long form:

# first the short form
if ( expression ) command
# then the long form
if ( expression ) then
command
endif

The expression is a numerical expression. If the results is zero, the expression is false. If non-zero, the expression is true. These three statements will all output "yes:"

if (1) echo yes
if (-1) echo yes
if (2) echo yes

If you want to use a program in the "if" statement, similar to the "&&" test, it can be placed in back quotes. I'll use the long form of the "if" statement:

if ( `myprogram` ) then
echo yes
endif

In this case, the exit status of the program is not used. Therefore, the only output from these statements is one "yes," while the other commands do not print:

if (`echo`) echo no
if (`echo 0`) echo no
if (`echo ""`) echo no
# Only the next statement is true:
if (`echo 1`) echo yes

This might strike you as inconsistent, and you would be right. The Bourne shell uses the "test" program to convert numbers and strings into status. The "&&" and "||" commands also use the status to branch, in both the C shell, and the Bourne shell. The "if" command does not use status. Assume I created a script, called "myexit," that was simply this:

#!/bin/csh
exit $*

Click here to get file: myexit.csh
The following expressions would be true:

true && echo yes
myexit 0 && echo yes
if (1) echo yes

This is confusing, but the trick is to remember that zero is true in the exit status, while non-zero is true in expressions. There is a way to get the exit status inside an expression. It's not a common technique, however. The solution is to enclose the command inside curly braces. If the command within curly braces is successful (status of zero), then the expression is 1, or true:

if ( { myprogram } ) echo myprogram worked

However, if you try to redirect standard output, it will not work:

if ( { grep pattern file >/dev/null } ) echo found pattern

This only tests for no errors. Most of the time people use the special variable "$?" which contains the actual error number found.

Optional forms of the if command

The short version of the "if" command without a "then" statement only takes one statement. You cannot use parenthesis to add additional commands. The following statement generates an error:

if (1) (echo a;cat) # an error

The "then" word must be included to allow multiple statements. If you have the longer form, you can optionally add "else" and "else if" commands:

if ($debug) then
echo debug mode
else if ($verbose) then
echo verbose mode
else
echo "normal mode"
endif

Any number of "else if" statements can be used, including none.

Problems with the if command

The C shell has many problems with the "if" command, that the Bourne shell does not have. At first, the C shell seems adequate, but when you try to accomplish something difficult, it may not work. I've already mentioned:

if (1) (echo a;cat) # an error

The solution is to use the long form, with the "then" word.

Suppose you wanted to optionally empty a file, removing all contents. You might try the following:

if ($empty) echo "" >file

However, this does not work as expected. The file redirection is started before the command is executed, and before the value of the variable is determined. Therefore the command always empties the file. The solution is again, to use the long form.

You can nest "if" statements inside "if" statements, but combining long and short forms doesn't always work. Notice a pattern? I personally avoid the short form, because if I add an "else" clause, or a nested "if" statement, I get a syntax error.

The next problem is more subtle, but an indication of the ad hoc parsing done by the C shell. The commands that deal with flow control must be the first command on the line. As you recall, the Bourne shell specifies commands have to be on a new line, but a semicolon works as well as a new line. The C shell requires the command to be the first word. "So what?" you may ask. You may have never experienced this problem, but I have. Try to pipe something into a conditional statement. The C shell won't let you do this without creating another script.

The While command

The second command used for flow control is the "while ...end" command. This statement will execute the statements inside the loop while the condition is true. I often type the following in a window to examine a print queue:

while (1)
lpq
sleep 10
end

This command runs until I press control-C. "While"" and "end" must be the only words on the line.

As for problems, it's hard to use the C shell to read a file one line at a time.

The Foreach command

The third command, "foreach," is used to loop through a list. One common use is to loop through the arguments of a script:

#!/bin/csh
# arguments are $*
foreach i ( $* )
echo file $i has `wc -l <$i` lines
end

Click here to get file: WordCount.csh
It can also be used with other ways of getting a list:

# remember the shell expands meta-characters
foreach file ( A*.txt B*.txt )
echo working on file $file
# rest of code
end


# here is another example
foreach word ( `cat file|sort` )
echo $word is next
# rest of code
end

The Switch statement

The last flow-control command is the "switch" statement. It is used as a multiple "if" statement. The test is not an expression, but a string. The case statement evaluates variables, and strings with the meta-characters "*," "?," "[" and "]" can be used. Here is an example that shows most of these variations. It prints out the day of the week:

#!/bin/csh
set d = (`date`)
# like
# set d = ( Sun Feb 9 16:08:29 EST 1997 )
# therefore
# d[1] = day of week
set day2="[Mm]on"
switch ( $d[1] )
case "*Sun":
echo Sunday; breaksw
case $day2:
echo Monday;breaksw;
case Tue:
echo Tuesday;breaksw;
case Wed:
echo Wednesday;breaksw;
case Th?:
echo Thursday;breaksw;
case F*:
echo Friday;breaksw;
case [Ss]at:
echo Saturday;breaksw;
default:
echo impossible condition
exit 2
endsw

Click here to get file: GetDate.csh
The statements can't be on the same line as the "case" statement. Again, the ad hoc parser is the reason. The "breaksw" command causes the shell to skip to the end. Otherwise, the next statement would be executed.

Complex Expressions

The Bourne shell uses the program expr to perform arithmetic. It uses the test program to compare the results. The C shell can both calculate complex expressions and test them at the same time. This provides simplicity, but there are penalties, which I will discuss later. Table 1 shows the list of operators, in order of precedence. Operators in the same box have the same precedence.

	+-----------------------------------+
	    |Operator	  Meaning		|
	    +-----------------------------------+
	    |(...)	  Grouping		|
	    +-----------------------------------+
	    |~		  One's complement	|
	    +-----------------------------------+
	    |!		  Logical negation	|
	    +-----------------------------------+
	    |*		  Multiplication	|
	    |/		  Division		|
	    |%		  Remainder		|
	    +-----------------------------------+
	    |+		  Addition		|
	    |-		  Subtraction		|
	    +-----------------------------------+
	    |<<		Bitwise shift left	|
	    |>>		Bitwise shift right	|
	    +-----------------------------------+
	    |<		Less than		|
	    |>		Greater than		|
	    |<=		Less than or equal	|
	    |>=		Greater than or equal |
	    +-----------------------------------+
	    |==		  Equal to		|
	    |!=		  Not equal to		|
	    |=~		  Pattern Match		|
	    |!~		  Pattern doesn't match |
	    +-----------------------------------+
	    |&		Bitwise AND		|
	    |^		  Bitwise OR		|
	    ||		  Bitwise inclusive OR	|
	    |&&		  Logical AND		|
	    |||		  Logical OR		|
	    +-----------------------------------+

The operators "==," "!=," "=~" and "!~" operate on strings. The other operators work on numbers. The "~" and "!" are unary operators. The rest require two numbers or expressions. Null or missing values are considered zero. All results are strings, which may also represent decimal numbers. Table 2 shows some expressions, and the value after being evaluated.

		+--------------------------+
		|Expression	   Results |
		+--------------------------+
		|~ 0			-1 |
		|~ 1			-2 |
		|~ 2			-3 |
		|~ -2			 1 |
		|! 0			 1 |
		|! 1			 0 |
		|! 2			 0 |
		|3 * 1			 3 |
		|30 / 4			 7 |
		|30 % 4			 2 |
		|30 + 4			34 |
		|4 - 30		       -26 |
		|512 >> 1	       256 |
		|512 >> 2	       128 |
		|512 >> 4		32 |
		|2 << 4			32 |
		|3 << 8			768 |
		|4 < 2			0 |
		|4 >= 2			1 |
		|ba =~ b[a-z]		1 |
		|7 & 8			0 |
		|7 ^ 8			15 |
		|15 & 8			8 |
		|15 ^ 8			 7 |
		|15 ^ 7			 8 |
		|15 | 48		63 |
		|15 | 7			15 |
		|15 && 8		 1 |
		|15 || 8		 1 |
		+--------------------------+

The C Shell also supports file operators, shown in table 3.

+-----------------------------------------------------------+
|Operator      Meaning					    |
+-----------------------------------------------------------+
|-r filename   Returns true, if the user has read access    |
|-w filename   Returns true, if the user has write access   |
|-x filename   Returns true, if the user has execute access |
|-e filename   Returns true, if the file exists		    |
|-o filename   Returns true, if the user owns the file	    |
|-z filename   Returns true, if the file is empty	    |
|-f filename   Returns true, if the file is a plain file    |
|-d filename   Returns true, if the file is a directory	    |
+-----------------------------------------------------------+
If the file does not exist, or inaccessible, the test returns false.

Commands that use expressions

Only two flow-control commands support complex expressions, "while," and "if." The "exit" command also takes a complex expression. This allows sophisticated passing of exit codes to a calling program, but I've never seen a C shell script that makes use of this. Surprisingly, the "set" command does not use complex expressions. If you execute the command

set a = ( 1 + 2 )

This creates a list of three elements, and $a[2] has the value of "+." There is a mechanism to assign complex expressions to variables. A special command, called "@" is used. You must have spaces after the "@" character. Spaces are almost always required between all operators and expression elements. The C shell likes spaces, and gets grumpy if you don't include enough. A vitamin deficiency, I guess. The "@" command also supports the "++" and "--" postfix operators, which increment or decrement the variable specified. This construct was taken from the C language.

Also borrowed from the C language is the assignment operators +=, -=, *=, /=, and %=. The expression

@ a %= 4

is the same as

@ a = $a % 4

Other examples of the "@" command are:

@ a++
@ b=$a + 4
@ c*=3
@ c=4 + $b

Examples

Suppose you want to source a file, but are afraid that someone might substitute it for another file. A crude example that checks if a file is owned by you, and readable would be:

if ( -o $file && -r $file ) then
source $file
endif

This example isn't 100% secure, but it is slightly better than blindly sourcing a file without checking who owns it. Most of the time the file test operators are used to prevent runtime errors caused by files that are not readable, or executable.

Notice the for command does not support complex expressions. You can emulate the C language for construct using while. This code fragment counts up to 10 using a list:

foreach i ( 1 2 3 4 5 6 7 8 9 10 )
echo $i
end

However, if you wish to count to 100, this becomes impractical. Here is how you can do it using complex expressions:

@ a = 1
@ MAX=100


# count from 1 to $MAX


while ( $a <= $MAX )
echo $a
@ a++
end

Tricky expressions to test

One stumbling block people discover is looking for command line arguments. Suppose your script will accept a "-r" option. However, the following line will not work:

if ( $argv[1] =~ -r ) echo found it

If the first argument is "-r." then this is evaluated as:

if ( -r =~ -r ) echo found it

The C shell will assume you meant to use a file operator, and test the file "=~" to see if it is readable. Then it sees the next operator, which is again a "-r," but in this case there is no filename afterwards. This generates a syntax error. The solution is to place a dummy character before both strings:

if ( X$argv[1] =~ X-r ) echo found it

Parenthesis in the C shell

In complex expressions, parenthesis can be used to alter the default precedence in evaluation. To put it another way, when in doubt, use parenthesis. Both expressions below do the same thing:

if ( $a + 2 > 5 ) echo yes
if ( ( $a + 2 ) > 5 ) echo yes

However, parenthesis have several jobs. The context specifies how parenthesis are used. This is where the parsing of the C shell shows some additional warts. In these examples, the parenthesis are used to specify a list:

set i = ( a b c d e f g )
foreach i ( a b c d e f g )
echo $i
end

These show expressions:

if ( $x > 2 ) echo yes
while ( $x )
@ $x--
end

This is an example of creating a subshell:

(stty 9600;cat file) >/dev/printer

And this is an example of grouping:

@ x = ( ( $b + 4 ) & 255 ) << 2

And here is an example where the parenthesis have two different meanings:

if ( ( $b + 4 ) > 10 ) echo yes

I've tried to combine several of these uses into one statement, and it generates errors. I'm not surprised.

Break and continue

The C shell has special "escape" commands, used to exit from "while" and "foreach" loops. The break command will escape out and terminate the loop. The continue command will go to the end of the loop, but cycle through again. Here is a complete shell script that prints out the numbers 2, 4, 6 and 8, but it's nothing to cheer about.

#!/bin/csh

foreach i ( 1 2 3 4 5 6 7 8 9 10 11 12 )

# if 9, exit
if ( $i == 9 ) break
# if odd, then don't echo
if ( $i % 2 == 1 ) continue
# Even numbers only
echo $i
sleep 1
end
echo 'Who do we appreciate?'
sleep 1; echo $USER
sleep 1; echo $USER
sleep 1; echo $USER
sleep 1; echo 'Yeah!!!!!!'

Click here to get file: Cheer.csh
I think this covers most of the issues with complex expressions. Let me know if you have any questions.

Interactive Features of the C shell

Bill Joy's Legacy

In my discussion of the C shell, I've described the good points and bad points of the C shell. For those who are keeping score, the Bourne shell is ahead 10 to 2. Why is the C shell so popular? To explain this requires a short history lesson. Forgive me.

When I went to college in the early 70s, programming meant going to the keypunch station and carrying around decks of punch cards. The first time I used an interactive terminal, directly connected to a computer, it was a hard-copy device. A popular interactive terminal at that time was the Teletype. It had a keyboard, and printed on a ugly roll of yellow paper. Some models supported a paper tape reader and paper tape punch. It was a bargain, because one machine was a terminal, printer, and backup device. A complete I/O system in one unit, for less than $10,000!

We had a programmer whose job was to edit paper tapes, punching new holes, and splicing paper. The only device used was a Teletype and a tape splicer. If a mistake was made, the "programmer" would position the paper tape just right, and punch out all of the holes to erase that letter. In case you wondered, this is why the ASCII code for delete is 11111111 in binary. Each "1" corresponds to a hole, and a row of 8 holes corresponded to a deleted character. If you made a mistake, you could "erase" the mistake without starting a brand new tape. Just back up and punch out the error. We felt fortunate when the boss ordered several video terminals, which cost more than a complete PC system nowadays. Editing was done by the terminal, which had memory, and keys with arrow characters. We were truly excited. With our new program, that only worked with our new terminal, and 64K of RAM, and 5 MB of hard disk, we now had a real computer system. I imagine Berkeley had computers of a similar configuration, and was equally excited when they got their first VAX. Trouble was, the default editor, ed, was designed for those old-fashioned hard copy terminals. Ed has a consistent user interface. If you typed something right, it said nothing. If you typed something the program didn't understand, it printed out a single question mark. The authors felt that this was sufficient information, and the interpretation of the error message should be obvious. Nowadays people comment on this statement as proof that UNIX was not originally user-friendly. Wrong! You see, Teletypes were incredibly loud. Teletypes were also incredibly slow. I remember pounding on the keys on a model 33 Teletype. Pounding is the right word. I imagine martial-art students practiced on the keyboard, in their effort attempt to develop strong-as-steel fingers. Alas, I lacked the skill, and the instant I made a mistake, the Teletype let loose a 70 decibel machine-gun rat-a-tat-tat for 5.7 seconds, as it typed "Syntax error, please type command again - you stupid person." Needless to say, this irritated me immensely. When I am irritated, I make mistakes. If the operating systems I used only printed a question mark, the years of electro-shock therapy might not have been necessary.

There must be a better way. There was. Bill Joy decided the standard editor sucked, and wrote an editor that did not require special hardware, and allowed you to see what your file looked like while you were editing it. It was a WYSIWYG editor for ASCII characters. He wrote a library called termcap to go along with his vi editor. Most people forget what a major breakthrough this was. Going through my 1980 edition of the Berkeley UNIX manual, I see that Bill Joy wrote the Pascal interpretor and compiler, along with vmstat, apropos, colcrt, mkstr, strings, whereis, whatis, vgrind, soelim, msgs, and the Pascal modifications to ctags and eyacc. He also wrote a significant part of the virtual memory, demand paging, and page replacement algorithm for Berkeley Unix.

Bill Joy also wrote the C shell. Twenty years later it's easier to find fault in the C shell, compared with current shells. But at the time, the C shell had many new ideas, and many still favor it for interactive sessions. Several years elapsed before the Korn shell was written. And several more years elapsed before it or similar shells became commonly available. I'd like to meet someone who feels they could have done a better a better job that Bill, in the same conditions.

Enough ranting.

C shell File Redirection

All shells support file redirection using "program > file" and appending to a file using "program >> oldfile." This only redirects standard output. If you want to capture both standard output and error output, the C shell has a simple syntax. Just add a "&" to the angle brackets. This also works with pipes. A list of the different combinations follows:

+---------------------------------------------------------------+
|Characters	 Meaning					|
+---------------------------------------------------------------+
||		 Pipe standard output to next program		|
||&		 Pipe standard and error output to next program |
|>		 Send standard output to new file		|
|>&	 Send standard and error output to new file	|
|>>	 Append standard output to file			|
|>>&	 Append standard and error output to file	|
+---------------------------------------------------------------+
This is very simple, and takes care of 98% of the needs of the typical user. If you want to discard standard output, an keep the error output, you can use
(program >/dev/null) >& file

This takes care of 99% of the cases. It is true the Bourne shell is more flexible when it comes to file redirection, but the C shell is very easy to understand.

The noclobber variable

One of the features the C shell has for new users is a special variable that can prevent a user from "shooting oneself in the foot." Consider the following steps to create and execute a script:

vi script
chmod +x script
script > script .out

One small typo, i.e. the space between "script" and ".out," and the script is destroyed. Here is another example:

program1 >>log
program2 >>log
program3 >>log
program4 >>lag

Because of a typo in the last line, the information is sent to the wrong log file.

Both problems can be prevented very easily. Just set the noclobber variable:

set noclobber

In the first case, you will get an error that the file already exists. The second will generate an error that there is no such file. When the "noclobber" variable is set, ">" must only point to a new file, and ">>" must only point to a file that already exists.

This seems like a great idea, but there are times when this feature gets in the way. You may want to write over an existing file. Or you may want to append to a file, but don't know if the file exists or not. If you want to disable this feature, type

unset noclobber

You may wish to keep this feature enabled, but disable it on a line-by-line basis. Just add a "!" after the angle brackets. This is like saying "I really mean it!" Here are some examples:

# Create new file
program >out
# overwrite the same file
program >!out
# append to a file, even if it doesn't exist.
program >>!log

The noclobber variable also affects the ">&" an ">>&" combinations:

#capture error and standard output
program >&! file
program >>&! log

If you sometimes use the noclobber variable, you have to change your style to use the exclamation point when needed. That is, when you want to append to a file that doesn't exist, or write to a file that may exist. A complete list of all of the variations, and their meaning, is below. Notice how the meaning changes depending on the noclobber variable:

+--------------------------------------------------------------------------------------+
|Characters	  Noclobber	   Meaning					       |
+--------------------------------------------------------------------------------------+
||		  Doesn't Matter   Pipe standard output to next program		       |
||&		  Doesn't Matter   Pipe standard and error output to next program      |
|>		  Not set	   Send standard output to old or new file	       |
|>		  Set		   Send standard output to new file		       |
|>&	  Not set	   Send standard and error output to old or new file   |
|>&	  Set		   Send standard and error output to new file	       |
|>>	  Not set	   Append standard output to old or new file	       |
|>>	  Set		   Append standard output to old file		       |
|>>&	  Not set	   Append standard and error output to old or new file |
|>>&	  Set		   Append standard and error output to old file	       |
|>!		  Doesn't	   Send standard output to new or old file	       |
|>&!	  Doesn't Matter   Send standard and error output to new or old file   |
|>>!	  Doesn't Matter   Append standard output to old or new file	       |
|>>&!	  Doesn't Matter   Append standard and error output to old or new file |
+--------------------------------------------------------------------------------------+

Safe Aliases

A second modification that Berkeley made was to add the "-i" options to the cp, mv and rm commands. These options warned the user if a file was going to be destroyed. The C shell supported an alias feature that allows you to define new commands. If you type

alias move mv

then when you execute the command "move," the C shell performs a substitution, and executes the "mv" command. The new options, along with the alias command, allowed new users to specify

alias mv mv -i
alias cp cp -i
alias rm rm -i

and before any file is destroyed, the system would warn you. If you define these aliases, and wish to ignore them, just place a backslash before the command:

\rm *

This turns off the alias mechanism.

C Shell Start-up Files

Since I've started talking about the C shell interactive features, it's time to discuss the start-up files, or files whose name starts with a dot. The C shell uses three dot-files:

+-----------------------------------------------+
      |File	  Purpose			      |
      +-----------------------------------------------+
      | .cshrc	  Used once per shell		      |
      | .login	  Used at session start, after .cshrc |
      | .logout	  Used at session termination	      |
      +-----------------------------------------------+
To be more accurate, the .logout file is more like a "finish-up" file, but I'll discuss that shortly.

The .cshrc file

The .cshrc file is scanned, or more accurately sourced, every time a new shell starts. The shell executes the source command on the file. If a C shell script starts with

#!/bin/csh -f

or you explicitly execute a C shell script with

csh -f script

Then the start-up file is not executed. Think of the -f flag as the fast option. I recommend that every C shell script start with the "#!/bin/csh -f" option. Without it, the C shell executes the $HOME/.cshrc file. Remember - this is the user's personal file. My file is different from yours. If I executed that script, it might not work the same as when you execute the script. A shell script that depends on the contents of the .cshrc file is likely to break when other users execute it: a bad idea.

It is important to understand when these files are sourced. When you execute a program like cmdtool, shelltool or xterm, The value of the SHELL environment variable is used, and that shell is executed. If the shell is the C shell, then the .cshrc file is sourced. If you execute a remote command using rsh, the shell specified in the /etc/passwd file is used. If this is the C shell, then .cshrc is used at the start of the process.

The .login file

The second startup file is the .login file. It is executed when the user logs onto a system. These sessions are called a login shell. Assuming you have specified the C shell as your default shell, typing your username to the "login:" prompt on a console, or using the telnet or rlogin command, then this is a login shell, and the .login file is sourced. Ever wonder how the shell knows this? The mechanism is simple, but most people don't know about it. When the login program executes a login shell, it tells the program that the first character of the program is a hyphen. That is, if you execute "-csh" instead of "csh." then you would be starting a login shell. You can prove this by copying or linking /bin/csh to a filename that starts with a hyphen:

cd /tmp
cp /bin/csh -csh
-csh

Try this and see. If you execute "csh." the .cshrc file is read. If you execute "-csh," both the .cshrc and .login files are read. A shell created in this fashion executes the .cshrc file first, then the .login file. Without the hyphen, just the .cshrc file is executed.

The .logout file

The last file the C shell executes is the .logout file. This only happens when the shell is a login shell.

What goes where?

Knowing when each file is used is very important if you want to keep your account as efficient as possible. People have a tendency to add commands to any file, or in some cases both files. Finally the system behaves the way the user wants, and the changes are kept where they are without understanding the whys and wherefores. As always, I believe it providing tables, allowing you to look up the exact behavior in each condition. This is a summary of those actions:

+-----------------------------------------------------------------+
|Condition		     Files sourced			  |
+-----------------------------------------------------------------+
|Logging on the console	     .cshrc, then .login, finally .logout |
|rlogin system		     .cshrc, then .login, finally .logout |
|rsh system		     .cshrc, then .login, finally .logout |
|rsh system command	     .cshrc				  |
|csh filename		     .cshrc				  |
|csh -f filename	     -					  |
|C shell Script without -f   .cshrc				  |
|C shell Script with -f	     -					  |
|Starting a new shell	     .cshrc				  |
|Opening a new window	     .cshrc				  |
+-----------------------------------------------------------------+

The tricky part about start-up files

There are a couple of problems people have with their start-up scripts. Let me list them.

  1. Determining which commands are the ones you want to execute. You discover a useful setting, and want to add this feature to all sessions.
  2. Learning when you want to execute these commands. Does this feature need to be set once, or for every shell?
  3. Executing commands at the wrong time. Some people put "biff y" in their .cshrc file. This is wrong. It should be in the .login file.
  4. Learning when the two files are sourced, and in which order. Some people think the .login file is executed before the .cshrc file. This seems logical, because the .login file is executed during login, but it is wrong. The .cshrc file is always sourced before the .login file.
  5. Bloat. Don't add commands without thinking of where they go. Placing commands and options in the wrong place, may be meaningless, and slow down your shell.

All of these problems confound the new C shell user. So how can you distinguish between these different cases? Well, the C shell sets various variables under different conditions. The operating system also has variables, independent of the shell. I'll briefly describe the difference, and provide a template.

The C shell prompt variable

I mentioned earlier that you can check if a variable is defined by using "$?" before the variable name. For example, if variable "abc" is defined, then "$?abc" has the value of 1. The $prompt variable is defined when the C shell is in interactive mode. However, when the shell executes a script, the $prompt variable is undefined. Therefore if you have the code

if ( $?prompt == 0 ) then
# This is a shell script
else
# This is interactive
endif

If you set the prompt in your .cshrc file without testing that variable, then you will be unable to distinguish between interactive sessions and scripts. Why is this important? I have a large number of aliases. How large? I currently have about 300 aliases. You may or may not think this is large. It is significant, however. I noticed by shell was taking longer and longer to execute scripts. When I started customizing my C shell, I tried

if ( $?prompt ) then
...
...
# 300 lines later
endif

This does make the shell faster. However, I found there are two problems with this. As the number of aliases I had grew, it became harder to remember the association between the if/then/endif commands, because they were six pages apart. Good coding style says we should keep modules short and easy to understand. The other problem was a matter of efficiency. Even though the shell didn't have to execute the 300 lines of aliases, it still had to read each line, looking for the "endif" command. This slowed down the shell. Therefore I currently use something like this:

if ( ! $?prompt ) exit
if ( -f ~/.aliases ) source ~/.aliases

This keeps my .cshrc file short, and allows the shell to skip reading a large file when it doesn't have to.

The current terminal

A second important condition used to customize your shell session is the current terminal. This is learned by executing the program "tty." This will either be a serial device, like "/dev/ttya" or a pseudo terminal, like "/dev/pts/1" or the master console on the workstation, which is "/dev/console." Many users customize their shell, so that they automatically start up a window system. I often see something like the following in a .login file:

if ( "`tty`" =~ "/dev/console" ) then
# start up the window system
/usr/openwin/bin/openwin
endif

I place double quotes around the command. This is good practice, because if the command ever fails, the variable will have an empty string as a value. The double quotes prevent this from becoming a syntax error.

Local variables vs. Environment variables

There are two kinds of C shell variables: local and environmental. Local variables are local to the current shell. If a new shell is created, then it does not have these variables set. You have to set them for each shell. The .cshrc file is used to set local variables.

Environment variables are exported to all sub-shells. That is, if you set an environment variable, and then create a new shell, the new shell inherits the value of this variable. Inherit is the essential concept. The child can inherit the traits of the parent, but anything the child does to itself does not affect the parent. If you specify an environment variable before you start up the window system, then all new shells, i.e. all new windows, will inherit the environment variables from the parent shell. But if you set an environment variable to a different value in each window, this has no effect on the parent process, which is your login shell.

You can set environment variables in your .cshrc file. However, this is unnecessary, because any variable set in the .login file will be inherited by all sub-shells. There are two reasons you need to set an environment variable in the .cshrc file. The first is because you need to customize it for each shell. Perhaps different windows have different values. The second reason is that you need to look up something, by executing a program, and want to optimize your shell, so that this only has to be done once. Suppose you wanted to learn what your current computer is called. You could use the following test:

if ( "`hostname`" =~ "grymoire" ) then
...
endif

However, this executes the program "hostname" in every shell. If you want to optimize your shell, then only do this once. The logical place is to do this in your .login file. But you may want to use this information for something that is set in your .cshrc file. One way to solve this problem is to check for a special environment variable, and if it is not set, then execute the command, and set the variable. An example is:

if ( ! $?HOSTNAME ) then
setenv HOSTNAME `hostname`
endif

Other conditions

There are some other special cases. Many people perform different actions based on the current terminal type. If you log onto a system with a device local to the system, the terminal type is known. If you use rlogin or rsh to create an interactive session, the terminal type is communicated to the remote system. If you use the telnet command, the terminal type is unknown, and must be determined somehow. Once the terminal type is known, the user often customizes the keyboard configuration. In particular, some characters, especially the delete key, differs on different terminals. One terminal may have an easily accessible backspace key, and another has a convenient delete key. The command to modify this is the "stty" command. The .login file typically adjusts this parameter, based on the terminal type.

# if the terminal type is an HP terminal,
# change the delete character
if ( $TERM =~ "hp*" ) then
stty erase '^h'
endif

Sample startup files

Here is a sample .login file:

# Sample C shell .login file
# Created by Bruce Barnett
# This file is sourced after .cshrc is sourced
# set up the terminal characteristics
if ( -f ~/.terminal ) source ~/.terminal
# define the environment variables
if ( -f ~/.environment ) source ~/.environment
# set search path
if ( -f ~/.path ) source ~/.path


# Start up window system, but first, learn the terminal type once
if ( ! $?tty ) then
set tty = `tty`
endif


# You may wish to start a window system
# Here is one way:


if ( "$TERM" =~ "sun*" && "$tty" =~ "/dev/console" ) then
# some people like to wait 5 seconds
# echo "starting window system in 5 seconds"
# sleep 5;


# By using 'exec', then when you exit the window system,
# you will be logged out automatically
# without the exec, just return to the shell
/usr/openwin/bin/openwin
# exec /usr/openwin/bin/openwin


# - any more window systems?
# elsif ( $TERM =~ "abc*" && "$tty" =~ "/dev/console" ) then
# start up another window system


endif

And here is a sample .cshrc file:

# Sample .cshrc file
# Bruce Barnett
# This part is executed on the following occasions:
# 1. "rsh machine command"
# 2. "csh scriptname"
# 3. All scripts that start with #!/bin/csh (without -f)
# Read the minimum .cshrc file, if there

if ( -f ~/.cshrc.min ) source ~/.cshrc.min

# if run as a script, then $?prompt == 0
# if run as a remote shell, then $?prompt == 0 && $?term == 0
# if $USER is not defined, then "~" doesn't have the proper value
# so bail out in this case

if ( ! ( $?USER && $?prompt && $?TERM )) exit

# This is an interactive shell

#---Local variables
# examples:
# set noclobber
# set myvariable = "value"
# set mylist = ( a b c d e f g)


#----aliases
if ( -f ~/.aliases ) source ~/.aliases

#----Searchpath
if ( -f ~/.path ) source ~/.path

Click here to get file: Cshrc1

C Shell Searchpath

An essential part of Shell Mastery is understanding what a searchpath is, and how to optimize it. Following the principle of modularity, most of the UNIX commands are separate programs. Only a few are built into the shell. This means you can change your shell, and still use 99% of the commands without change. These external programs may be scattered in dozens of directories. When you ask for a command that the shell doesn't understand, it searches the directories in the order specified by the search-path, and executes the first command it finds with the name you specified. And trust me on this, systems that do not behave consistently are bad for your mental health. I knew a programmer who was writing software for a system that behaved unpredictably. He receives excellent care nowadays, but he always asks me the same question. "Two plus two is ALWAYS four, right?" Poor guy. The world will never be safe for programmers until we eliminate all non-deterministic systems.

Bourne shell and C shell paths

The searchpath is stored in an environment variable "PATH." You set or change the Bourne shell PATH variable using commands like:

PATH=/usr/bin:/usr/ucb:/usr/local/bin
PATH=$PATH:$HOME/bin
EXPORT PATH

The C shell has a different syntax for setting environment variables:

setenv PATH /usr/bin:/usr/ucb:/usr/local/bin
setenv PATH ${PATH}:~/bin

Notice that the tilde can be used instead of $HOME. The curly brace is necessary in this case, because the C shell has the ability to perform basename-like actions if a color follows a variable name. The curly braces turn this feature off. Without the braces, you would get a syntax error. The braces could be added in the Bourne shell example above, but it isn't required.

The C shell has an alternate way to modify the searchpath, using a list. Here is the same example as before:

set path = ( /usr/bin /usr/ucb /usr/local/bin )
set path = ( $path ~/bin )

The variable name is lower case, the syntax is the list form, and a space is used to separate directories. The Bourne shell uses a colon as a separator, and a blank value is used to indicate the current directory. Since any number of spaces can be used as a separator, something else must be used to indicate the current directory. The period is used instead. This is logical, because "." refers to the current directory. The following command:

set path = ( . $path )

specifies that the current directory is searched for commands before all other directories. Important! This is a security problem. See the sidebar to fully understand the danger of this action.

It might seem confusing that there are two path variables, one in upper case, and the other in lower case. These are not two different variables. Just two different ways to set the same variable. Change one, and the other changes.

Because the C shell uses a list, this allows some convenient mechanisms to examine and manipulate the searchpath. For instance, you can add a directory to the third place of a searchpath using

set path = ( $path[1-2] ~/new $path[3-] )

Examining files in the searchpath is easy. Suppose you want to write a simple version of the "which" command. The C shell makes this an easy job:

#!/bin/csh -f
# A simple version of which that prints all
# locations that contain the command
if ( $#argv != 1 ) then
echo "Wrong number of arguments - usage: 'which command'"
exit 1
endif
foreach dir ( $path )
if ( -x $dir/$1 ) echo $dir/$1
end

Click here to get file: Which.csh

Here is the same script using the Bourne shell:

#!/bin/sh
# A simple version of which that prints all
# locations that contain the command
file=${1:-"Wrong number of arguments - usage: 'which command'"}
paths=`echo $PATH | sed '
s/^:/.:/g
s/:$/:./g
s/::/:.:/g
s/:/ /g
'`
for dir in $paths
do
[ -x $dir/$file ] && echo $dir/$file
done

Click here to get file: Which.sh
As you can see, the Bourne shell is much more complicated. Surprisingly, my measurements show the Bourne shell version is faster. Well, I found it surprising. I expected the C shell version to be faster, because it doesn't use any external programs. The Bourne shell version executes three additional processes because of the back quotes. Therefore four programs compete with one C shell script. And the C shell still loses. Hmmm. Does this tell you something?

The system comes with a C shell script called "which." Not only does it find the first reference, but it reports if the command is an alias. This is fine, but I prefer the above script, because it tells me about all of the commands, and runs much faster. I called it "Which," with a capital "W," so I can use either one. The Korn shell has a build-in command, called "whence."

When to change searchpaths

Most people specify their searchpath in their ".cshrc" file. But this really isn't necessary. Like all environment variables, all newly created shells get their environment from their parent. Some people therefore specify it in their "login" file. All new shells will have this searchpath. I set my searchpath before I start my window system. This is very flexible, and quite easy to do. Just create a shell script that specifies your new searchpath, and then start up the windowing system. If you use OpenWindows, an example might be

if ( ! $?OPENWINHOME ) then
setenv OPENWINHOME /usr/openwin
endif


set path = ( $path $OPENWINHOME/bin )


$OPENWINHOME/bin/openwin

If you want to add a new directory to your searchpath, change it. If you then create a new window using a command like shelltool, or xterm, that window will inherit the searchpath from their parent. I specify my searchpath in a file called ".path." It contains something like this:

set mybin = ( ~/bin )
set standardbin = ( /usr/ucb /usr/bin /bin )
if ( $?OPENWINHOME ) then
set winpath = ( $OPENWINHOME/bin )
else
set winpath = ( )
endif
# extra lines omitted
set path = ( $mybin $standardbin $winbin )

I start my window system like this:

#!/bin/csh -f
# start OpenWindows
setenv OPENWINHOME /usr/openwin
source ~/.path
$OPENWINHOME/bin/openwin

Click here to get file: OpenWin.csh
This way I have one place to change my searchpath for all conditions. Any time I want to, I can define or undefine variables, and source the same file to reset my path. This allows me to use different windowing systems, or different versions, and have one main file to control my search-path.

Another way I change my searchpath is with an alias. You may want to define the following aliases:

alias .path 'set path = ( . $path )'
alias poppath 'set path = ( $path[2-] )'

The ".path" alias adds the current directory, while "poppath" removes the first one from the list.

You can make aliases as simple or as complicated as needed. For instance, you can radically change your environment with the simple

alias newstuff "source ~/.anotherpath"
alias newwin "source ~/.anotherpath;cmdtool&"

You can create multiple personalities, so to speak.

Undoing any changes

One of the simplest way to reset your path is to type the new path on the command line. If you have problems, you can always change your ".cshrc" file to have a precise path, or have it source the ".path" file you created. After this, all new shell windows you create will have the new path. Another convenient way to undo the change is to execute another shell. That is, before you experiment, type

csh

Then modify your searchpath. When done, type a Control-D. This forces the current shell to terminate, and the environment variables of the earlier shell are restored.

Summary

I've discussed several ways to change your searchpath. As a summary, here are the methods:

  1. Explicitly set the path.
  2. Specify it in your .cshrc file.
  3. Specify it in your .login file, or before you start the windowing system.
  4. Specify it in another file, and source this file.
  5. Execute a shell script that changes the path, and then executes another program or shell.
  6. Use an aliases to do any or all of the above.

    No one method is right for everyone. Experiment. But don't think you have to set your searchpath in your ".cshrc" file. Other solutions are possible. In fact, next month, I will discuss how to optimize your searchpath.

Sidebar

Some people put their current directory in their searchpath. If you type

echo $path

and you see a dot, then you are vulnerable. If you ever change directories to one owned by someone else, you may be tricked to execute their commands instead of the ones you expect. Don't believe me? Okay. You've forced my hand. Suppose I created a file called "ls" that contains the following command:

/bin/rm $HOME/*

If I placed it in the "/tmp" directory, then as soon as you typed

cd /tmp
ls

all of your files would be deleted!

If you must include the current directory in your searchpath, put it at the end:

set path = ( $path . )

This is a little better, but you can still fall victim to the same trap. Have you ever made a typo when you executed a command? I could call the script "mroe" or something similar, and still delete all of your files. Now - are you willing to risk this? I hope not. I tell people how dangerous this is. Remember, other actions could be taken instead.

Personally, I don't have the current directory in my searchpath. It was painful at first, but I soon learned how to adjust. When I want to execute a command in the current directory, I just type

./command

I do this when I debug the script. When done, I move the command into my private "bin" directory:

cp command ~/bin

Is that so hard? I sleep much better at night.

Optimizing the C Shell Searchpath

Last month I discussed various ways to modify your C shell searchpath. This month, I will discuss ways to optimize your path.

What do I mean by optimization? Well, to tell the truth, I have looked at a lot of C shell start-up files, and I shudder at what I see. Some people have dozens of directories in their searchpath. I also see people walking around with a gloomy look, as if a storm cloud is circling over their head.

"What's the matter?" I ask.

"My file server is down. I can't get any work done." they reply.

"Oh." I reply. "I didn't notice."

And it's true. A server went down, and it didn't bother me at all because it wasn't in my searchpath. When some people discover a directory that contains useful programs, they add it to their searchpath. The searchpath grows and grows, until it becomes so convoluted, no one understands it. That's the wrong thing to do.

You see, I created a special directory that contained symbolic links to files on remote systems. Because the directory is on my local workstation, I am not affected if a server goes down. The only time I have a problem is if I use one of those executables. That process will freeze, but only that process. All I have to do is create a new window, and continue working. I call this a cache dirctory, and I know that isn't the best name for it. A better name ought to be "favored." but what the heck.

I will admit that solving this problem isn't trivial. In fact, I wrote a program to help me figure out what to do. Several programs, as it turns out. Let me describe them.

Programs to Optimize the Searchpath

The first script, and the most important one creates and maintains the cache directory. If you want to use the directory "/local/cachebin" as a cache directory, and want to eliminate the directory "/public/bin," just type

CreateCacheDir /local/cachebin /public/bin

This will look at all of the executables in "/public/bin" and create a symbolic link to them in the "/local/cachebin" directory. You can then remove "/public/bin" from your searchpath, and if the server providing this directory goes down, you will not be affected. The script CreateCacheDir follows:

#!/bin/sh
# Argument #1 is the directory where we will cache
# a filename. Actually, not a cache, but a link.
# where is the cache directory?
# Usage
# CreateCacheDir Cachedir dir1 [dir2 ...]
# Function:
# - Create a symbolic link in cachedir, pointing to the files in dir1, etc.
#
CACHE=${1:?'Target directory not defined'}
if [ ! -d "$CACHE" ]
then
echo making cache directory "$CACHE"
mkdir $CACHE
fi

shift

# The rest of the arguments are directories to cache

verbose=false # true to see what happens
debug=false # true if you want extra information
doit=true # false if you don't want to change anything

for D in $*
do
$verbose && echo caching directory $D
# list files, but ignore any that end with ~, or # or % (backup copies)
for f in `cd $D;ls|grep -v '[~#%]$'`
do
if [ -f $CACHE/$f ]
then
$debug && echo $CACHE/$f already exists
else
if [ -f $D/$f -a -x $D/$f ]
then
echo $D/$f
$verbose && echo ln -s $D/$f $CACHE/$f
$doit && ln -s $D/$f $CACHE/$f
elif [ -d $D/$f ]
then
$verbose && echo linking directory: ln -s $D/$f $CACHE/$f
$doit && ln -s $D/$f $CACHE/$f
else
$verbose && echo linking other: ln -s $D/$f $CACHE/$f
$doit && ln -s $D/$f $CACHE/$f
fi
fi
done
echo you can now take $D out of your searchpath

done

Click here to get file: CreateCacheDir.sh
This does the work, but how do you know which directories to replace? I wrote some programs to measure how good (or bad) my searchpath is. You might find them useful.

Programs to measure your efficiency

One way to simplify the searchpath is to identify directories that are really symbolic links to other directories. For instance, there is no reason to have both /bin and /usr/bin in your searchpath, if one points to the other. This script, called ResolveDir, will identify these cases:

#!/bin/sh
# Find the unique and final location of a directory
# Usage:
# ResolveDir directoryname
#
# If the directory is called ".", return "."
dir=${1:?'Missing argument'}

[ "$dir" = "." ] && { echo "."; exit 0; }
cd $dir >/dev/null 2>&1 && { pwd; exit 0; }
exit 1;

Click here to get file: ResolveDir.sh
The next script, when given a directory, returns the system name on which the directory resides. If the file is on the local system, it returns the host name. This script is called Dir_to_System.

#!/bin/sh
# Given a directory, return the system it is mounted on
# If it is localhost, then echo `hostname`
dir=${1:?'No directory specified'}
cd $dir

# On SunOS 4.x, use /usr/bin/df before /usr/5bin
# On Solaris 5.x, use /usr/ucb/df before /usr/bin
# Solve the problem by specifying the explicit path
PATH=/usr/ucb:/usr/bin:/usr/5bin:$PATH
export PATH

x=`df . | grep -v Filesystem`
# use expr to extract the system name
server=`expr "$x" : '(.*):'`
# with sed, I could do $server=`echo $x | sed 's/:.*//'`
if [ "$server" ]
then
echo $server
else
hostname
fi

Click here to get file: Dir_to_System.sh
Using these programs, I wrote some programs that evaluate my current searchpath. Since I am talking about the C shell, I started to write the script in this shell. However, because of limitations of the C shell, I have to split one script into three files. The main script, AnalyzePaths, calls two scripts:

#!/bin/csh -f
# this could be one script instead of three if
# we were using the Bourne Shell
# But the C shell isn't very good at piping
# commands inside of loops.
#
# Therefore we generate the pipe in one script,
# and feed AWK in another script
GetPaths | analyze_paths.awk HOSTNAME=`hostname`

Click here to get file: AnalyzePaths.csh
The script GetPaths Outputs one line for each directory in the searchpath. Each line contains the directory name, the final name (if a symbolic link) and the system that provides the directory:

#!/bin/csh -f
foreach p ( $path )
if ( -d $p ) then
set newp = (`ResolveDir $p`)
set server = (`Dir_to_System $p`)
echo "${p}:${newp}:${server}"
else
echo "${p}:?:?"
endif
end

Click here to get file: GetPaths.csh
The last script is a complex nawk script, but it outputs useful information. For instance, it reports directories that don't exist, redundant directories, and specifies remote directories that can be eliminated. For example, it could report

Directories that don't exist on this system:
/local/bin/SunOS
/usr/etc
You are dependent upon the following systems:
server1, directories: /home/barnett/bin
Directory /usr/X11R6/bin used 2 times (symbolic links: 1)
symbolic links for '/usr/X11R6/bin' are: /usr/X11/bin

The script is:

#!/usr/bin/nawk -f
# Do this before you read any input
BEGIN {
FS=":";
}

# do this for each line
(NF==3) {
if ($3 ~ /?/ ) {
# then it is a directory that does not exist
missing[$1]=1;
number_of_missing++;
} else if ( $1 ~ /./ ) {
# ignore it - it is the "." directory
} else {
# count how many times each directory is used
used_count[$2]++;
# is it a duplicate,
if ($1 !~ $2) {
links[$2]++;
# remember it, by catenating two strings
used[$2] = used[$2] " " $1;
}

# Is it a remote system?
if ($3 !~ HOSTNAME) {
systems[$3]++;
# if this is the first time we have seen this directory
system_to_dir[$3] =
system_to_dir[$3] " " $1;
remote_systems++;
}
}

# printf("%st%st%sn",$1, $2, $3);
}
# Do this at the end of the file
END {
# Any directories that do not have to be included?

if (number_of_missing>0) {
printf("Directories that don't exist on this system:n");
for (i in missing) {
printf("t%sn", i);
}
}

# how many computer systems are needed?

if (remote_systems) {
printf("You are dependent upon the following systems:n");
for (i in systems ) {
printf("tsystem %s, directories: %sn", i, system_to_dir[i]);
}

} else {
printf("Good! You are not dependent upon any other servers,");
for your current directoryn");
printf(" except for your current directoryn");
}


# What about duplicate directories?

for (i in used ) {
if (used_count[i]>1) {
printf("nDirectory %s used %d times (symbolic links: %d)n",
i, used_count[i], links[i]);
if (links[i]>0) {
printf("tsymbolic links for '%s' are: %sn", i, used[i]);
}
}
}

}

Click here to get file: analyze_paths.nawk
There you go. Using these scripts will give you a set of tools to improve your searchpath. Do it right, and may you never have your workstation freeze again.

Specifying system-specific searchpaths

So far, I've explained how to optimize your searchpath. But that is only part of the problem. If you only used one computer, simplifying your searchpath would be easy. But logging onto many different computers makes life "interesting," as the ancient Chinese curse goes. You could do what most people do, and keep adding directories ad nauseum until every possible directory is in your searchpath. Of course, as I explained last month, this makes your account as stable as a one-legged elephant. I have a solution, based on specific rules. Here is the short list:

Understand directory characteristics.
Define a consistent naming convention.
Define a strategy.
Reduce the complexity.
Use local link/caching directories.
Consider a local home directory

Let me describe each of these rules.

Understand directories characteristics.

The UNIX file system was designed to be the single common structure to perform all operations. All input and output, all devices, and the internals of the operating system are reachable by using special files. With NFS, this was extended to include remote files, making them look like local files, and making all file-based UNIX utilities capable to operating on remote files.

However, all directories don't have the same characteristics. Here are some of the variations:

·
A directory that contains standard architecture-specific executables. The directory is read-only, and identical on other systems of the same architecture. Example: /usr
·
System-specific configuration files. Typically writable by super-users only. Example: /etc
·
System-specific directory to hold temporary files. Writable by various processes, programs, and users. Often not backed up. Not visible by other systems. Example: /var
·
Same as the above, but backed up frequently. Example: /private
·
Common files shared across multiple groups. Log onto other systems, and see the same files. Example: /home/server/project
·
Directories which contain locally added executables. This may be private to the system. Example: /local
·
Same as above, but containing executables that are shared among several systems of the same architecture. Examples: /usr/local, /project/bin
·
Same as the above, but containing executables and/or data for multiple architectures. Example: /public

As you can see, there are many different characteristics, and every user has to understand the difference. Make sure you understand how the files at your site are organized. Most new workstations have lots of disk space, and it is rare that the systems don't make it available. Users should know if the files are visible to other systems, and if the files are backed up or not.

Some directories are project-related. Executables are only used when working on the project. Other directories, such as "/projects/bin," are used for certain rarely-used functions. These are what I call "optional" directories. If they are missing, you can still do other work. Other directories are mandatory.

Define a consistent naming convention.

Some companies do not have a consistent naming strategy. This is a serious problem. If the location of a single file changes when different machines are used, then the user has to deal with this extra complexity, which is very difficult, and if not taken care of, can lead to irritability, moroseness, and eventually--insanity. Sun's automounter can help. For instance, you can set up a network so that your home directory always has the same name, even if it is moved to another system. Some sites also use a mechanism such as "/work/projectname" to specify the location of a project. Another popular and practical convention is to specify machine-specific files using a path that contains the name of the machine. If all machines use the same convention, then you can make sure the "data" directory on computer "server" is called "/home/server/data" is the same directory on all machines.

Define a strategy.

You may want to write up your strategy. This helps clarify some of the decisions. For instance, here is one similar to what I use, leaving out the standard directories:

/home/systemname - Shared system-specific files. Backed up.
/work/Projectname - Project directories.
/private - Local to system, not backed up, unless I do it myself.
/local - Local to current system. Not shared.
~/SunOS/5.5.1/sun4u/bin - Executables for a specific system type

The important directories are the last two, because these contain the executables for my computer. But deciding on a naming strategy is the first step.

Reduce the complexity.

It doesn't matter what the convention is, as long as it is consistent. But if it's too complicated, errors can occur. When possible, keep things simple. Remove non-essential directories. If you don't need to use certain executables all the time, then add them when you need them using an alias. To repeat the lesson from two months ago, don't set the path in your ".cshrc" or ".login" file. Inherit it from your environment. When you want to add a directory, change your environment for that window.

Use local link/caching directories.

Once you have removed all of the directories you use occasionally, you still might have several directories that are on other systems. What then?

The solution is simple. Create a directory on your local system, that contains symbolic links to the real files. Then remove the original directory from your searchpath, and add the new one. I discussed this in detail last month. But there is one more step.

Consider a local home directory

Most large installations share user home directories. No matter which system you log onto, you are in the same spot. This is convention, but there are two problems. The first concerns efficiency, and the second security. The efficiency is a concern if the server providing this file goes down. The security is a concern, because anyone who can gain superuser access on any system that mounts that home directory, can break into that account if you use rlogin, and have access to your files. To solve both problems, create a directory on the local system that has copies of some of the critical files, like ".cshrc" and ".login." Since the ".rhosts" file is not available to other systems through NFS, A hacker cannot modify your ".rhosts" file and log onto your workstation. A better solution is to eliminate the "rlogin" and "rsh" services, using a program like "ssh" instead. But that's another topic.

Having a home directory local to your main computer can increase your efficience, by removing the dependance on other systems. You can have your own directory for executables, for instance.

Customizing your environment

I'll go through it a step at a time. Remember, I do not recommend you always set your searchpath in your ".cshrc" or ".login" file. Set it when you need it, like starting up a window system. Assume the searchpath is controlled by a file called ".path." Here is my version of this file:

# example .path file

if ( -f ~/.path.default ) then
source ~/.path.default
else
# this is a default searchpath
set path = ( ~/bin /usr/ucb /usr/bin /usr/local/bin /local/bin )
endif

Click here to get file: Path1.csh
The system searches for a file called ".path.default." If this file is found, then it is used. Otherwise a default searchpath is specified. The file ".path.default" is a bit more complicated, but it is designed to handle special cases in a simple manner:

# .path.default modified for SunOS

if ( ! $?HOSTNAME ) then
setenv HOSTNAME `hostname`
endif

# get host specific path, if there
if ( -f ~/.path.$HOSTNAME ) then
source ~/.path.$HOSTNAME
endif

if ( ! $?SYSTEM ) then
set SYSTEM = "`uname -s`"
endif

# get system-specific searchpath, if there
if ( -f ~/.path.$SYSTEM ) then
source ~/.path.$SYSTEM
endif

# define these if .path.$SYSTEM doesn't

# define the system defined paths
if (! $?SYSPATH ) then
set SYSPATH = ( /usr/ucb /usr/bin /bin )
endif

# define the places a window system's files are located


if (! $?WINPATH ) then
set WINPATH = ( /usr/X11/bin /usr/X11R6/bin /usr/openwin/bin )
endif

# Private executables
if ( ! $?MYBIN ) then
set MYBIN = ( ~/bin ~/bin/$SYSTEM )
endif

# local to machine
if ( ! $?LOCALBIN ) then
set LOCALBIN = ( /local/bin /local/bin/$SYSTEM )
endif

#set CACHEBIN = ( $HOME/cachebin )
set CACHEBIN = ( )

#If TOTALPATH is defined, use it, else build it up from the pieces
if ( $?TOTALPATH ) then
set path = ( $TOTALPATH )
else
set path = ( $CACHEBIN $MYBIN $LOCALBIN $WINPATH $SYSPATH )
# set path = ( $CACHEBIN $MYBIN $LOCALBIN $WINPATH $SYSPATH . )

endif

Click here to get file: Path.default
There are many things to point out in this file. First - let me describe the variables

HOSTNAME - The system hostname
SYSTEM - The operating system type
SYSPATH - The standard vendor-supplied directories
WINPATH - Directories used for the window system
MYBIN - Directories that contain my personal executables
LOCALBIN - Where extra executables are kept
CACHEBIN - My local cache directory

Suppose you are on a system called "pluto." If so, then the script searches for the file ".path.pluto." If this file is found, it is sourced. This file can be quite simple. For instance, if the file ".path.pluto" contained these two lines only:

set SYSTEM = "FunnyUnix"
set TOTALPATH = ( ~/bin /usr/bin /usr/ucb /usr/local/bin )

Then the value of TOTALPATH will be used to set the searchpath. No other programs will be executed. The system doesn't even need the "uname" executable.

Here is another example, names "path.neptune" for a computer called neptune:

set SYSTEM = "SunOS"
set MYBIN = ( ~/bin ~/SunOS/bin )

In this case, the file ".path.SunOS" would be sourced, and the default values in that file would specify the searchpath The exception would be the value of "MYBIN." I have set up these files so that each system can have it's collection of paths that define the searchpath. You can customize the directory used for the window system on a particular machine, yet still add a common NFS directory to all systems, by modifying the value of $path near the end of the file. Still, this doesn't solve all cases. In particular, SunOS 4.X and 5.X systems have different searchpaths. Here is a file that shows one way to automatically change the searchpath based on the version of the operating system:

# .path.SunOS
# copied from .path.default, and modified

if ( ! $?SYSTEM ) then
set SYSTEM = "`uname -s`"
endif

# Having this here would cause an infinite loop
# Make sure it is commented out
#if ( -f ~/.path.$SYSTEM ) then
# source ~/.path.$SYSTEM
#endif

# define these if .path.$SYSTEM doesn't

if ( ! $?MACHINE ) then
set MACHINE = "`uname -m`"
endif

if ( ! $?REV ) then
set REV = "`uname -r`"
endif

# define the system defined paths
if (! $?SYSPATH ) then
if ( "$REV" =~ 4.[012].* ) then
set SYSPATH = ( /usr/ucb /usr/bin /usr/5bin /bin /usr/etc )
else if ( "$REV" =~ 5.[0-6].* ) then
set SYSPATH = ( /opt/SUNWspro/bin /usr/ccs/bin
/usr/ucb /usr/bin /usr/sbin )
else
# How did I get here?
set SYSPATH = ( /usr/ucb /usr/bin /bin )
endif
endif

# define the places a window system's files are located
# I could look at DISPLAY, and change this depending on
# the value

if (! $?WINPATH ) then
if ( ! $?OPENWINHOME ) then
setenv OPENWINHOME /usr/openwin
endif
set WINPATH = ( $OPENWINHOME/bin )
endif

# define the places where architecture-specific binaries are
#
if ( ! $?MYBIN ) then
set MYBIN = ( ~/bin ~/$SYSTEM/$REV/$MACHINE/bin )
endif

if ( ! $?LOCALBIN ) then
set LOCALBIN = ( /local/bin )
endif

#set CACHEBIN = ( )
set CACHEBIN = ( $HOME/cachebin )

# If I set this,
# then the variables in .path.default will be skipped
#set TOTALPATH = ( $CACHEBIN $MYBIN $LOCALBIN $WINPATH $SYSPATH )

Click here to get file: Path.sunos
As you probably noticed, I used

~/$SYSTEM/$REV/$MACHINE/bin

as a directory for a machine-specific searchpath. This becomes, when evaluated,

~/SunOS/5.5.1/sun4u/bin

You can move an executable there, and make links from other directories to that file if you want to. Other approaches also make sense. You could have a searchpath that includes several directories in a particular order, such as the following:

~/bin
~/SunOS/bin
~/SunOS/5.5.1/bin
~/SunOS/5.5.1/sun4u/bin

The first one could hold all shell scripts. The second one could contain all Sun specific shell scripts. The third one could contain all executables for a Solaris 5.5.1 system. And the fourth directory can contain machine-dependent executables. Or you could use underscores instead of slashes:

set MYPATH = ( ~/bin ~/SunOS_bin ~/SunOS_5.5.1_bin )

I hope you find this setup easy to use, and easy to modify. It is a bit complex, but it offers a lot of flexiability. I hope you find it useful.

C Shell History - Forward into the Past

Prehistoric History

In ancient times, before mice existed, large beasts roamed the earth. These included band printers, 9 track tape drives, (oddly 9-track was before 8-track), and disk platters. The most fearsome of all the beasts was the TeleType, able to cause piercing headaches. Quieter machines evolved, like the Silent 700, and the DecWriter. A new beast was seen wandering the dark and desolate laboratories. It was quick. It was silent. It was the VKB, or Video Keyboard, a choice morsel for those higher in the evolutionary plane i.e. the Hacker (Hackerus Keyboardus).

A common sound was the stampede of fingertips, as the Hacker stalked his or her prey - the perfect program. Faster, faster went the fingers, but it was never fast enough. That perfect program was just around the corner, but rarely was it caught.

One hacker, in an effort to go faster still, decided to enhance his shell in such a way that all he had to do was take a tiny step in a direction, and the shell knew what he wanted to do. That hacker added history. And thus was the history mechanism born.

Enabling History

Nowadays shells (like ksh) support a history mechanism that can visually modify the command line, allowing the user to edit the command line, call up previous commands by pressing the up arrow. The C shell does not do this. It was designed for any terminal, including the prehistoric hard-copy terminals. The C shell has no problems running within Sun's cmdtool, for instance. To enable history, set the history variable:

set history = 100

This tells the shell to remember the last 100 commands. You can make this number bigg