Next: Detail

Background

In trying to perform its task, the scheduler has various objectives that it must try to achieve. And, as we shall see, several of these objectives will make demands on the scheduler that will pull in opposite directions. Among the more important objectives are:

Give each process a fair share of the CPU.
Minimize the CPU idle time (i.e. keep the CPU busy).
Maximize throughput. This means maximizing the number of processes com-pleted in a given time.
Minimize the time the system takes to respond to user requests.

An obvious way in which these objectives conflict is with regard to user requests. Here, it seems as though user requests should be favored in some way, though this obviously conflicts with the idea of giving all processes a fair share of the CPU.

One problem that the scheduler faces immediately is that, in general, very little is known about a process when it starts up. In particular, such things as how long the process will use the CPU on average before it has to stop to wait for some I/O to take place, and how long on average the process must wait for an I/O event to take place once it is requested, are unknown.

Another question is what to do with a process which uses a very large amount of CPU time before it does any I/O; can it be allowed just to go on hogging the processor? Well, obviously it can't, otherwise such a process could virtually stop all the other processes from making any progress.

This implies that there needs to be some mechanism for switching the CPU to another process when the current process has had long enough. But where is the decision to do this taken and how?

In general, there are two possibilities here, the first is to allow the processes themselves to release control of the CPU voluntarily when they have had a reasonable amount of time on the CPU, and the second is to find some mechanism to force a process to release its CPU control. The first of these is called non-preemptive scheduling and the second is called preemptive scheduling.

As we shall see when we look at threads, it is perfectly feasible to run a system of co-operating pieces of code that use non-preemptive scheduling to switch between them. However, for process scheduling in a multi-user environment, it is usually safer to assume that the processes may be, at best, unaware of each other's existence or, at worst, positively antagonistic towards each other in terms of their CPU use.

Consequently, in a multi-user environment, preemptive scheduling is used almost without exception and Linux is no exception.

The way that this is achieved is that each process is given a maximum length of time it will be allowed to have on the CPU without a break. In addition, the system will generate fairly rapid and periodic clock timer interrupts, which will be counted and used to determine when the process has had its allocated time slice.

When the time slice of the current process has elapsed, the scheduler will be forced to run to determine if there is another process ready to run and more eligible to run than the process whose time slice on the CPU has just expired. If there is, then it will be run in place of the current process. If there is not, then the current process can be allowed to continue.

Obviously, from the previous discussion, there are several states in which a process may find itself, and several transitions between states that can occur in response to various events. Figure 1 shows a greatly simplified state transition diagram for processes under the control of the CPU scheduler and labels six transitions between the states which occur in the following circumstances:

When a process is first started (by fork() in Linux) it is not given control of the CPU straight away. Instead, it joins a list of other processes that are in the position that they can proceed immediately, as soon as they have time on the CPU. This, then, is the starting transition which puts the process into the runnable state.
Eventually, a process that is runnable will be selected by the scheduler to execute for a time on the CPU. Transition 2 takes the process from the runnable list and starts it running.
There are several ways for a process to move from the 'on CPU' state. Transition 3 is the one which takes the currently running process and places it back on the runnable list. This happens when the process has used up its allocated time slice on the processor and must be pre-empted to give other processes a chance to run.
The other main way for a process to lose control of the processor occurs when it requests some I/O operation which will involve a time delay while the machine hardware responds to the request. During this I/O wait, the process is not in a position to run even if the processor becomes totally free. Consequently, when a process blocks waiting for I/O in this way, it cannot be returned to the runnable list, because it simply isn't runnable. It is therefore transferred via transition 4 into a blocked state awaiting the completion of its I/O request.
When the I/O event for which a blocked process has been waiting occurs, the process becomes eligible to run once again. However, the process is not immediately given CPU time to deal with its I/O but is only returned to the runnable list to await selection once more by the scheduler. This is transition 5.
The final way to lose control of the processor, shown on this simplified diagram, occurs when a running process terminates. Transition 6 shows the event leading to process termination.

Surprisingly perhaps, this state transition diagram is generally applicable regardless of the actual algorithm used by the CPU scheduler to determine which process should run next.

There are many possible algorithms which can be used for CPU scheduling, including lots of theory about which is best in various sets of circumstances. However, rather than engage in a general discussion of scheduling algorithms here, we shall just concentrate on the details of the actual Linux scheduling algorithm itself

Next: Detail