PREV UP NEXT The UNIX Computer System at ARE (Edition )

2.5.2: Command Pipelines

As mentioned earlier, the power of UNIX lies in its modularity. Each command does one thing well. UNIX allows you to combine several of these single-purpose commands into a command pipeline. Here's a simple, and very useful, command pipeline that allows you to page through a long directory listing:

$ ls -al | less

The `ls -al' command generates a detailed list of file names and attributes, one file per line. The output of that command is piped into the pager `less'. If there are more files in the directory than can be displayed on a single screen, this command lets you page through them at your leisure.

Pipes allow the output of one command to become the input to another. UNIX uses the terms standard output and standard input to refer to the two ends of a pipe. Commands in the pipeline don't know where the data is coming from or where its going: they just read the standard input, do their thing with it, and send the results to the standard output. The shell is responsible for making sure the beginning and end of a pipeline are connected properly to a meaningful source and destination.

Here's another, more complicated, example of a pipeline. This one prints out the number of files in the current directory that contain the word `hysteresis':

$ grep 'hysteresis' * | awk -F: '{print $1}' | sort | uniq | wc -l
  -|  2 

The first thing the shell does when it sees this command is expand the `*' into a list of files in the current directory. The `grep' command then searches through this list of files for the word `hysteresis', outputting all lines from any file that contain the word, prefixed by the file name and a colon. The shell pipes this output into the `awk' command, which splits each line into fields separated by colons and then prints the first such field. The printed field is the file name part of the line printed out by the `grep' command. So the output from the `awk' part of the pipe is a list of file names. Since the word `hysteresis' may appear multiple times in a file, at this point its possible that `awk' has outputted a list that contains duplicate file names. The `sort' command makes sure the list of names is sorted for the `uniq' command which removes any duplicates (keeping only unique items). Finally this list of sorted, unique file names is piped into the `wc' command to count how many lines are in the list. Pretty cool.