Next: Questions

The Code

Having seen the basic data structures and their use let us now take a look through the code. The tiny shell has been designed in a basically top-down fashion and that is the order in which we'll work through it.

main Function

The top level design of any interpreter is basically a loop which just gets something to do and then goes away to do it. The tiny shell is no exception so that the main() function is particularly simple:

	#include "def.h"

	main (void)
	{
		int i;

		initcold();

		for (;;)
		{
			initwarm();

			if (getline()) 
				if (i = parse())
					execute(i);
		}
	}

The first thing that happens is that main() calls a one-off initialization function called initcold(). This is followed by an endless loop which repeatedly prints a prompt, gets lines of input, parses them to extract the simple commands and their parameters, and then executes each of the simple commands, dealing with I/O redirection and pipes as appropriate.

In detail, the body of the loop first calls initwarm() to initialize things for each input command line and display the tsh: prompt. This is followed by a call to getline(), which copies the user's command line into the line[] array and returns a non-zero value if the length of the entered line was less than MAXLINE.

If a non-zero value is returned by getline(), then the parse() function is called next. This function has the task of parsing the contents of the line[] array and extracting the relevant information from it. This includes the simple command names and their associated parameters, the input and output file names for any I/O redirection, and the pipeline background (&) operator, if specified.

Assuming there are no errors, the parse() function returns a positive value which is the number of simple commands in the input command pipeline. On error parse() returns the value zero. If there have been no errors so far, then, finally, the execute() function is called with the count returned by parse() passed in as a parameter. The execute() function uses fork() and exec() to execute each of the simple commands in turn, creating pipes and opening I/O redirection files as required.

Initialization Functions

The contents of the initcold() initialization function are currently commented out so that the function actually does nothing. This is because the tiny shell has no built-in commands like the real shell. What this means is that shell commands like exit, which will terminate the real shell, cannot be used with the tiny shell in the same way. The easiest way to terminate the tiny shell is therefore to send it an interrupt (Ctrl-c) or quit (Ctrl-\) signal from the keyboard. Obviously, if the contents of initcold() were uncommented, then this would not be possible and the tiny shell would then be more difficult to terminate.

	initcold (void)
	{
	/*
		signal(SIGINT, SIG_IGN); 
		signal (SIGOUIT, SIG_IGN);
	*/
	}



	initwarm(void)
	{
	int i;

		backgnd = FALSE; 
		lineptr = line; 
		avptr = avline; 
		infile[0] = '\0';
		outfile[O] = '\0';
		append = FALSE;

		for (i = 0; i<PIPELINE; ++i)
		{
			cmdlin[i].infd = 0; 
			cmdlin[i].outfd = 1;
		}

		for (i = 3; i<OPEN_MAX; ++i) 
			close(i);

		printf("tsh: "); 
		fflush(stdout);
	}

The initwarm() function is executed at the start of each iteration of the 'get a command, execute a command' loop. Essentially, it just initializes the global variables, sets the infd and outfd file descriptors for each of the simple command structures in the cmdlin[] array to their default values, close()s all file descriptors other than the standard input, output and error output devices, and finally displays the tsh: prompt.

getline Function

	getline (void)
	{
		int i;

		for (i = 0; (line[i] = getchar())!='\n' && i<MAXLINE; ++i);

		if (i==MAXLINE)
		{
			fprintf(stderr, "Command line too long\n"); 
			return(ERROR);
		}
		line[i+1] = '\0';
		return (OKAY);
	}

This function just takes characters from the standard input device and copies them into the line[] array, until either a newline (\n) character is encountered or MAXLINE number of characters have been entered, in which case the line[] array is full.

If MAXLINE characters have been entered then the function returns ERROR; otherwise a string terminator (\0) is added after the last character entered, and the OKAY value is returned instead.

parse Function

The syntax of a command pipeline looks as follows:

	cmd [ < filename ] [ | cmd ]...  [ or filename ] [ & ]

Here cmd means any simple command including its parameters, and or is either of the output redirection operators (> or >>). Square brackets are used to enclose optional items and the ellipses (...) mean that the preceding bracket contents may be repeated zero or more times.

Piecing it all together, what this says is that a tiny shell command pipeline starts with a simple command. This is then optionaUy followed by an input redirection operator and its associated filename. This in turn can optionally and repeatedly be followed by a pipe operator and then another simple command. After the last simple command, which may also be the first if no pipes are used, an optional output redirection operator and its associated filename may be specified. Finally, the whole pipeline may optionally be set to run in the background by specifying the & operator.

The parse() function needs to make sure that the syntax of its input conforms to this specification, whilst at the same time extracting the tokens belonging to each simple command and the file names for any input or output redirection that might be specified.

	parse (void)
	{
		int i;

		/* 1 */
		command(0);

		/* 2 */
		if (check("<"))
			getname(infile);

		/* 3 */
		for (i = 1; i<PIPELINE; ++i)
			if (check("|")) 
				command(i);
			else
				break;

		/* 4 */
		if (check(">"))
		{
			if (check(">")) 
				append = TRUE;

			getname(outfile);
		}

		/* 5 */ 
		if (check("&"))
			backgnd = TRUE;

		/* 6 */ 
		if (check("\n"))
			return(i); 
		else
		{
			fprintf(stderr, "Command line syntax error\n"); 
			return (ERROR);
		}
	}

The numbered comments in this listing are described as follows:

The first thing that happens is that a call is made to the command() function, which, as we shall see later, knows how to extract the words from the command line associated with a single simple command. These words will be copied into the avline[] array as separate strings and the av[] pointers in the appropriate struct cmd will be set up to point to these strings. The parameter passed into the command() function is used to tell it which struct cmd in the cmdlin[] array to use.
After the first command has been parsed, the next section looks for the optional input redirection operator using the check() function. If check() finds a match for its parameter string at the current position in the line[] array it will return a true value and the getname() function will then be called to extract the following file name into the infile[] array.
Simple command zero has already been parsed so the next code section looks for the optional pipe operators followed by simple commands one, two, and so on. Inside the for loop, if a pipe operator is found then the command() function is called again to parse the next simple command. The parameter to command() is supplied this time by the for loop control variable (i). The for loop repeats until no more pipe operators are found or until the 'maximum number of simple commands in a pipeline' limit has been reached.
This section checks for the output redirection operator, and then checks whether or not this is append output redirection by looking for the second > symbol. If it is append output redirection then the append flag is set to TRUE (from its FALSE default set up in initwarm()). If either output redirection operator is present, then the getname() function is used to copy the associated file name into the outfile[] array.
The specification now allows for an optional & operator, and this is checked next. If the operator is present then the backgnd flag will be set to TRUE (the FALSE value was set in initwarm()).
Whatever path was taken through the preceding code, the input line should now have been parsed to completion and all that should remain is the newline (\n) character on the end. If this is so, then parse returns the value of the variable i which gives the number of simple commands found in this command line. If a newline is not the next character, then an error message is displayed and the value ERROR is returned from parse() instead.

command Function

Basically, this function copies the names of commands and their parameters one word at a time from the line[] array to the avline[] array. Each time a word is copied, a pointer is set up to point to the start of the word in one of the av[] arrays in the cmdlin[] array of structures:

	command(int i)
	{
		int j, flag, inword;

		for (j = 0; j<MAXARG-l; ++j)
		{
			while (*lineptr==' ' || *lineptr=='\t') 
				++lineptr;

			cmdlin[i].av[j] = avptr;
			cmdlin[i].av[j+l] = NULL;

			for (flag = 0; flag==0;)
			{
				switch (*lineptr)
				{
					case '>':
					case '<':
					case '|':
					case '&':
					case '\n':
						if (inword==FALSE) 
							cmdlin[i].av[j] = NULL;

						*avptr++ = '\0';
						return;
					case ' ':
					case '\t':
						inword = FALSE;
						*avptr++ = '\0';
						flag = 1; 
						break;
					default:
						inword = TRUE;
						*avptr++ = *lineptr++; 
						break;
				}
			}
		}
	}

Within the tiny shell in general, and this function in particular, the variable lineptr points to the current place of interest within the line[] array and the variable avptr points to the next character position to use in the avline[] array.

The command() function consists of a pair of nested for loops. The inner loop copies the characters of a word from the line[] array to the avline[] array and terminates the word in the avline[] array with a \0 byte. The outer loop is responsible for setting up pointers to the words in the appropriate av[] array elements.

execute Function

In each complete pipeline there can only be one input redirection file specified, attached to the first simple command, and only one output redirection file can he specified, attached to the last simple command. Obviously, if there is only one simple command in the pipeline then it qualifies as both the first and the last command on the line and, consequently, it can have both its input and output redirected to files.

If there is more than one simple command in a pipeline, say j of them, then j-1 pipes will need to be created to connect the simple commands together. Bearing these points in mind we can now look at the code for the execute() function:

	execute(int j)
	{
		int i, fd, fds[2];

		/* 1 */
		if (infile[0] !='\0')
			cmdlin[0].infd = open(infile, O_RDONLY);

		/* 2 */
		if (outfile[0] !='\0')
			if (append==FALSE)
				cmdlin[j-1].outfd = open(outfile, O_WRONLY | O_CREAT | O_TRUNC, 0666);
			else
				cmdlin[j-1].outfd = open(outfile, O_WRONLY | O|CREAT | O_APPEND, 0666);

		/* 3 */
		if (backgnd==TRUE) 
			signal (SIGCHLD, SIG_IGN);
		else
			signal (SIGCHLD, SIG_DFL);

		/* 4 */
		for (i = 0; i <j; ++i)
		{
			/* 5 */
			if (i<j-1)
			{
				pipe (fds); 
				cmdlin[i].outfd = fds[1]; 
				cmdlin[i+1].infd = fds[0];
			}

			/* 6 */
			forkexec(&cmdlin[i]);

			/* 7 */
			if ((fd = cmdlin[i].infd)!=0) 
				close(fd);

			if ((fd = cmdlin[i].outfd)!=1) 
				close(fd);
		}

		/* 8 */
		if (backgmd==FALSE) 
			while (wait (NULL) != lastpid);
	}

Remember that the execute() function takes a single integer parameter, which is the number of simple commands in the command pipeline. This value is assigned to the variable j. The numbered comments in the listing are explained as follows:

If an input redirection file has been specified then openO the file and assign the resulting file descriptor to the infd field in the struct cmd associated with simple command zero (i.e. the first simple command on the command line). Notice that the tiny shell code commits one of the biggest deadly sins of Linux programming - it doesn't check the return values of most of the system calls it makes, to look for error returns. Adding the check code is a very easy task (left as an exercise for the reader) which has been omitted simply to make the code easier to read.
Similarly, if an output redirection file has been specified then it will be open()ed and the resulting file descriptor assigned to the outfd structure member associated with the last (j-1 th) simple command on the command line. One extra twist here is that there are two sorts of output redirection (overwrite and append) which need the file to be opened in two different ways. The selection is made in this case just by testing the value of the append flag.
The next section is involved with zombie child processes. If the command line is to he run in the background (backgnd==TRUE) then SIGCHLD is set so that zombie child processes will not be created when the child processes terminate. This is the correct action here because the shell won't be wait()ing for them. If the command line will be run in the foreground then SIGCHLD is set to its default action, which also causes the system to generate zombies from child processes when they die, until the shell wait()s for them.
The next section is a loop, the body of which will be executed once for each simple command in the command line, with the variable i specifying which simple command to work on next.
The value j-1 gives the command number of the last simple command in the command line. Where this value is greater than zero, it means that there is more than one simple command in the pipeline. For each simple command whose number is less than j-1 a pipe needs to be created to pipe the output (outfd) of command number i to the input (infd) of command number i+1.
All the file descriptors and command line parameters needed by command number i are now available and ready so that simple command i can now be executed. The forkexec() function does just what its name implies it creates a child process to run a simple command. Therefore, only in the parent process does the forkexec() function return.
If any I/O redirection or pipes were involved in the simple command that has just been started then the parent's copies of the file descriptors are closed here.
Finally, when all the simple commands have been started, if this command line is running in the foreground then the parent (tiny shell) process enters a loop to wait() for the termination of the last process in the pipeline, whose process ID will have been set in lastpid by the forkexec() function.

forkexec Function

The forkexec() function has the task of creating a child process with the fork() system call Then within the child process it performs any required I/O redirection before using execvp() to execute the required command.

	forkexec(struct cmd *ptr)
	{
		int i_pid;

		/* 1 */
		if (pid = fork())
		{
			/* 2 */ 
			if (backgnd==TRUE)
				printf("%d\n", pid);
			lastpid = pid;
		}
		else
		{
			/* 3 */
			if (ptr->infd==0 && backgnd==TRUE) 
				ptr->infd = open("/dev/null", O_RDONLY);

			/* 4 */ 
			if (ptr->infd!=0)
			{
				close(0);
				dup(ptr->infd);
			}

			if (ptr->outfd!=1)
			{
				close(1);
				dup(ptr->outfd);
			}
			
			/* 5 */
			if (backgnd==FALSE)
			{
				signal (SIGINT, SIG_DFL); 
				signal (SIGQUIT, SIG_DFL);
			}

			/* 6 */
			for (i = 3; i<OPEN_MAX; ++i) 
				close(i);

			/* 7 */
			execvp(ptr->av[0]  ptr->av); 
			exit(1);
		}
	}

The forkexec() function takes as its parameter a pointer to the struct cmd associated with the simple command it is about to execute. Remember that this structure contains the file descriptors that the command is to use for its standard input and standard output (though the redirection has not taken place yet) and it also contains an array of pointers to command line parameters (including the command name itself) that will become the command's argv parameter.

The numbered comments in this function are described as follows:

The first task is to fork() a child process and store the process ID that is returned.
In the parent process, if this command line is to be run in the background then the process ID of each child process will be displayed as it is created. The process ID of each child is also stored in the variable lastpid. As the simple commands in the command line are executed from left to right, the final value of lastpid will be the process ID of the last simple command to be created (i.e. the right-most simple command). This is the process ID that wait() will be looking for if this is a command line to execute in the foreground.
In the child process the first thing to happen is a rather subtle test. If this is to be a background command line then it must be prevented from taking input from the keyboard. In fact, only the first simple command in a pipeline can take its input from the keyboard as all the rest of the commands will take their standard input from pipes, by redirection. So, if this is to be a background command line, and the file descriptor which is to be the standard input for the current simple command (infd) is still set up as the keyboard (infd==0), then this must be the first simple command in the pipeline and it needs to have its standard input diverted away from the keyboard so there is no chance that it will try to read from it. The simplest and safest way to do this is just to set up an automatic input redirection in this case. The obvious file to use for this redirection is just the standard file /dev/null, which will immediately return EOF if the process tries to read() from it. Using this simple mechanism saves the tiny shell from being involved in any kind of job control functions.
The next stage is actually to perform the input and output redirection by manipulation of the file descriptors if they have changed from their default values.
If this is a foreground pipeline then the current simple command is set up so that it can be terminated by an interrupt or quit signal from the keyboard, unless it makes a subsequent and specific provision to the contrary.
The next step is that all the file descriptors except the standard input, output and error output are close()'d so that this process cannot interfere with any of the pipes or I/O redirection files that the shell has set up, except those it is supposed to see via the standard file descriptors 0,1 and 2.
Finally, the simple command is executed by the execvp() system call, using the values in its av[] array for the command name and the argv pointer.

check Function

	check(char *ptr)
	{
		char *tptr;
		
		while (*lineptr== ' ' ) 
			lineptr++;

		tptr = lineptr;

		while (*ptr!='\0' && *ptr==*tptr)
		{
			ptr++;
			tptr++;
		}
		if (*ptr!='\0')
			return(FALSE); 
		else
		{
			lineptr = tptr; 
			return(TRUE);
		}
	}

The check() function compares the string passed to it as a parameter with the characters in the line[] array starting at the position of lineptr. If a match is found then lineptr is moved on beyond the characters matched and the value TRUE is returned. If check() doesn't find a match then a value of FALSE is returned and lineptr is left unchanged so that checks can be performed for other strings starting in the same place in the line[] array.

getname Function

	getname(char *name)
	{
		int i;

		for (i = 0; i<MAXNAME; ++i)
		{
			switch (*lineptr)
			{
				case '>':
				case '<':
				case '|':
				case '&':
				case ' ':
				case '\n':
				case '\t':
					*name = '\0';
					return;

				default:
					*name++ = *lineptr++;
					break;
			}
		}
		*name = '\0';
	}

This function just copies a file name up to MAXNAME characters long into the char array pointed to by its name parameter.

Next: Questions