Next: Timers

Interrupts

Associating an interrupt service routine with a particular interrupt request line is done in the same way for block device drivers as you have already seen for character device drivers using the request_irq() function.

The general sequence of events for device drivers containing an interrupt service routine is as follows:

A user process performs some kind of I/O request (say a read() system call) which eventually, perhaps via the buffer cache, requires some physical device I/O to take place.
The device driver readO function, or perhaps the request() function, will then be called to send instructions to the hardware to perform the I/O operation. The device driver now needs to wait for the operation to take place.
After some time, the hardware becomes ready to perform the specified operation and generates an interrupt to flag the fact.
The interrupt causes your driver's interrupt service routine to be called, which then has the task of copying the requested data from the hardware into a device driver memory buffer, and informing the waiting read() (or request()) function that the data is now available.
With the data available, the read() or request() function can now complete its task in making the data available to the user process.

This picture is a bit simplistic and needs to be tightened up in several places, but it does give a reasonable starting point. The main problems occur in three areas:

What happens if an interrupt routine needs to change some data which other parts of the kernel might also want to change?
What do the read(), write() and request() driver functions do while a hardware I/O request is taking place?
Why don't interrupt routines just copy data directly to and from user space?

Critical Sections

When two sections of kernel code both need to access and modify particular kernel data structures, their efforts must be carefully coordinated if disastrous consequences are to be avoided. Nowhere is this more true than when an interrupt service routine is involved as one of the sections of kernel code which can make the changes.

Consider the following scenario:

Some part of the kernel is making modifications to the structure of a linked list. A new node is being added to the list by breaking a link at the appropriate place in the list and adding the new element. At some point in this process the data structure could well be in an unstable state, where only part of the job has been completed and one or more pointers could be pointing to incorrect places. Normally, this doesn't matter because two or three statements later, when the addition is complete, everything has been restored to a stable state and it is safe for other code to use the list again. If, however, an interrupt were to occur while the list was unstable, and if the interrupt service routine then needed to access or modify the same list, then I'm sure you can see the potential for problems.

Problems of this nature come under the general heading of critical sections. In this context, critical section refers not to the data structure, but to the sections of code that make critical access to it. Obviously, the idea here is to guarantee the accuracy and integrity of the data.

Any code in the kernel that contains a critical section with respect to some kernel data structure, must have a mechanism available to allow it to protect itself from interruption at the times when its critical sections are executing.

When a critical section of code is identified, the standard technique for dealing with the problem is to disable interrupts for the duration of the critical section. Obviously, critical sections should be kept a short as possible so that interrupts are not delayed any longer than necessary. Interrupt requests can be enabled and disabled whenever necessary from within kernel code by the use of two special instructions:

	#include <asm/system.h>

	sti();
	cli();

Both sti() and cli() are macros defined in <asm/system.h>. The sti() call is used to enable interrupt requests, while cli() disables requests.

sleep and wakeup

When an I/O request is made by a device driver function on a piece of hardware that can generate interrupts, the function must wait until the hardware completes its function, generates an interrupt, and the interrupt service routine has copied the data into memory. Only then can the original device driver function continue. It would be quite possible for the function just to enter a while loop of some kind looking for the interrupt service routine to complete its task. This, however, would be ridiculously inefficient in terms of processor use, especially since any amount of time might elapse before the requested data was finally made available.

The real problem here is that once a piece of code within the kernel gets control of the CPU, it stays in control until it gives up the processor voluntarily.

Don't forget that functions within a device driver get called as a result of a user process issuing a system call, so that these functions are actually executing in the kernel but on behalf of the user process. This means that nothing further will happen with the user process until its I/O request has been satisfied, and, therefore, that the process does not need to hang on to its control of the CPU. If the process was running in user code then control of the CPU would be taken from it automatically. As the process is running in kernel code it must give up control itself, to allow other processes to continue while it waits for its I/O to complete.

In order to release the CPU, the device driver function will call the sleep_on() function:

	#include <linux/sched.h>

	void sleep_on(struct wait_queue **ptr);
	void interruptiblesleep_on(struct wait_queue **ptr);

As far as the function which calls it is concerned, a call to sleep_on() is just like a call to any other function, except that this call will not return until the event for which the function is waiting has occurred.

It is possible that more than one process may wish to wait for the same event to occur. This is handled in Linux by putting all the processes waiting for a particular event onto a single wait queue. A wait queue is actually a circular linked list of struct wait_queue nodes which is automatically built up by the sleep_on() function when it is called.

In order to start a new wait queue you need to declare a pointer to a wait queue node and initialize it to zero. The address of this pointer can then be passed to sleep_on(), which will add the current process to the wait queue, flag the process as sleeping so that the Linux scheduler will not choose this process to run, and then call the scheduler to pick the next process to run:

	#include <linux/sched.h>

	static struct wait_queue *p = 0;

	sleep_on(&p);

If you use the sleep_on() call then only a wake_up() call on the same wait queue will wake it up. The interruptible_sleep_on() function is the same as the sleep_on() function except that sending signals or timeouts to the sleeping processes can also be used to wake them up.

When an event occurs (such as I/O ready) for which one or more processes are waiting, then a call to the wake_up() function with a particular wait queue as a parameter, will cause all the processes on that queue to be flagged as runnable again, so that the scheduler will consider them as eligible to run when next it is called:

	#include <linux/sched.h>

	void wake_up(struct wait_queue **ptr);
	void wake_up_interruptible(struct wait_queue **ptr);

In the case of a device driver the wake_up() call is usually issued by the interrupt service routine, after it has copied the required data into kernel memory.

Remember that a wake_up() call will resume all process sleeping on the specified wait queue. This can have some consequences if, for example, the event for which the processes are waiting is for some piece of hardware to become free so that they can write to it. In this case, the first one of the newly woken processes that gets to run will take the hardware device out of service again. This means that, even though the device was free when they were woken, by the time the other processes get to run again the device has become busy again. When this type of scenario is a possibility, it shows the need to re-test the device before it is used, and go back to sleep if it is busy - even if the process has only just been woken from a previous sleep.

The wake_up_interruptible() function is the same as the wake_up() function, except that it will only wake up the interruptible processes on the specified wait queue, whereas the wake_up() function will wake them all.

Memory Access

As you have seen, a process is made to sleep (the device driver calls sleep_on()) while it waits for physical I/O to take place, thus allowing other processes to use the CPU at this time. It follows, then, that when the interrupt occurs and the interrupt service routine runs, the service routine will not be running in the context of the process for which the data is eventually intended. Indeed, that process is sleeping, waiting to be woken at the end of the interrupt service routine.

Because the currently running process is definitely not the one for which the data is intended, under NO circumstances should the interrupt service routine copy the data from the hardware straight into user space. What the service routine should do is copy the data into a kernel memory buffer associated with the device driver and then wake up the sleeping process. This will cause the original sleep_on() call to return the next time that process runs. As this code is running in the correct context, it is then safe to copy the data to user space, but not before.

Next: Timers