When you use the shell to run a command (ls, say) then at some point the shell will execute a fork() call to get a new process running. Having done that, how does the shell then get ls to run in the child process instead of the duplicate copy of the shell, which is what will be running immediately after the fork() call?
The solution in this case is to use an exec() system call. In fact, there are several different flavors of the exec() call, but they all perform essentially the same task. Use of the exec() system call is the only way to get a program executed under Linux. The way that it does this is to replace the text and user data segments of the process that executes the exec() call with the text and user data contained in the program file whose name is passed as a parameter to exec(). This is probably best illustrated with a simple example.
Before we do that, however, I need to digress slightly to explain a little more about exec(). When you execute a program from the shell, you have seen that it is possible to specify parameters and switches to the program on the command line. From your knowledge of C you also know that these command line values are made available to a program via the argc and argv parameters to main(). Somehow the shell needs to be able to take your command line values and pass them on to the programs it runs on your behalf as their argc and argv. This is done by passing your command line values in a suitable form to exec() which will arrange for them to appear as argc and argv to the new program about to be run.
The simplest version of exec() with which to demonstrate these things is called execl(). The prototype for this is:
#inciude <unistd.h> int execl(char *patbname, char *arg0, ...);
The pathname is the full path to the command to execute. This is followed by a variable length list of pointers to character strings. These will become the contents of the array pointed to by argv in the new program. The list of pointers in execl() should be terminated by a NULL pointer. The following example program uses execl() to execute the simple command - ls -l:
#inciude <stdio.h> main() { execl("/bin/ls", "ls", "-l", 0); printf("Can only get here on error\n"); }
The first parameter to execl() in this example is the full pathname to the ls command. This is the file whose contents will be run, provided the process has execute permission on the file. The rest of the execl() parameters provide the strings to which the argv array elements in the new program will point. In this example, it means that the ls program will see the string ls pointed to by its argv[0], and the string -l pointed to by its argv[1].
Normally the exec() calls don't return. In general, they can't because their function is to replace the text and user data segments in the process that calls them for some other program - so there will be nothing to return to!
However, if the exec() calls fail for any reason (usually because you don't have execute permission for it) then they will return so that you can have the opportunity to do something about the error.
In addition to making all these parameters available to the new program, the exec() calls also pass a value for the variable:
extern char **environ;
This variable has the same format as the argv variable except that the items passed via environ are the values in the environment of the process (like any exported shell variables), rather than the command line parameters. In the case of execl(), the value of the environ variable in the new program will be a copy of the value of this variable in the calling process.
The execl() version of exec() is fine in the circumstances where you can ex-plicitly list all of the parameters, as in the previous example. Now suppose you want to write a program that doesn't just run ls, but will run any program you wish, and pass it any number of appropriate command line parameters. Obviously, execl() won't do the job.
The example program below, which implements this requirement, shows, however, that the system call execv() will perform as required:
#inciude <stdio.h> main(int argc, char **argv) { if (argc==1) { printf("Usage: run <command> [<paraneters>]\n"); exit(1) } execv(argv[l], &argv[1)); printf("Sorry... couldn't run that!\n"); }
The prototype for execv() shows that it only takes two parameters, the first is the full pathname to the command to execute and the second is the argv value you want to pass into the new program. In the previous example this value was derived from the argv value passed into the run command, so that the run command can take the command line parameter values you pass it and just pass them on.
The following is a typical command sequence using the run command:
$ run ls -l mtos Sorry... couldn't run that! $ run /bin/ls -l mtos total 2 drwxr-xr-x 2 pc book 1024 Apr 2 20:11 tdd drwxr-xr-x 2 pc book 1024 Apr 2 20:11 tsh
Notice the failure on the first attempt, this is because you need to specify the full pathuame to the command you want to run when you are using execv() (or execl()), as can be seen from the result of the second attempt.
The problem here is that the two versions of exec() covered so far do not use the values in your PATH environment variable when looking for the command you specify. As you may have guessed, however, there are versions of exec() that do. These are called execlp() and execvp(), which are exactly the same as the first pair but they also use the value of PATH to find the required command.
The final two versions of exec() are the same as the first two (i.e. they don't use PATH) but they do allow you manually to specify the value to appear in environ in the new program rather than accepting the automatic default. They are called execle() and execve(). The following is a list of all six exec() variants and the parameter numbers and types they take:
int execl(pathname, argo, ..., argn, 0); int execv(pathname, argv); int execlp(cmdname, arg0, ..., argn, 0); int execvp(cmdname, argv); int execle(patbname, arg0, ..., arga, 0, envp); int execve(pathname, argv, envp); char *pathname, *cmdname; char *arg0, ..., *argn; char **argv, **envp;
Just as with fork(), most process attributes are preserved across an exec() call. This is because the system data segment remains intact during an exec(), which only changes the text and user data segments. Most important is the fact that the file descriptors which were associated with open file descriptions before the call are normally still available after it. The exception to this is that a flag exists within each file descriptor (not the open file description) called the close on exec flag (guess what that does).
Setting the close on exec flag is another one of the miscellaneous facilities provided by a call to fcntl(). The form of the call for setting the close on exec flag on a particular file descriptor (fd) is:
fcntl(fd, F_SETFD FD_CLOEXEC);
Just to recap for a moment, when you want a running process to organize the execution of another program as part of its operation, the sequence of steps is as follows:
Two obvious questions arise on looking at this scenario. First, wasn't it rather a waste of effort creating the child process as an exact duplicate of the parent only to scrap that immediately and replace it with a new program? And, second, what does the parent process do while the child process is executing?
Taking the questions in that order - Linux implements its fork() system call in a very efficient way. In effect, Linux cheats. It doesn't really make a full copy of the process at all, it just has two sets of pointers, one for each process, that both point to the same real text and data segments. In the case of the text segment, this doesn't matter because the segment is read-only anyway. In the case of the data segments, however, it decidedly does matter because these are supposed to be a pair of independent processes. But, thinking about it for a moment, as long as the two processes only read data from the data segments and don't do any writing of new values, then neither process will know that the cheat is taking place. In order for this scheme to work completely, all Linux has to do is to spot when one of the two processes tries to write something to one of the data segments and create a copy of just that little bit when it is needed. This technique is called copy on write.
In essence, any area that gets written to by either process gets copied so that both processes then have their own copies of those areas. If the child process immediately performs an exec() call, then very little of the shared data segment space will have been copied before the exec() takes place - a huge saving in time and resources.