NEXT UP previous
Next: Terminal I/O

Random Files

All the file access to date has been sequential access by default. This is because all the reads and writes take place starting from the current file offset position. The file offset value is then automatically incremented to just beyond the position where the read or write finishes, which makes it ready for the next access to take place.

Given this scenario, random access under Linux couldn't be simpler. All you need to do is to alter the current file offset value to the position of interest which will automatically force the next read() or write() to take place at this position (unless the file was opened O_APPEND of course, in which case any write() calls will still take place at the end-of-file position).

The system call required to perform this task is lseek:

	#include <sys/types.h>
	#include <unistd.h>

	off_t lseek(int fd, off_t offset, int base);

The fd parameter is the file descriptor whose associated open file description will be modified by the call.

When specifying a new position for the file offset value, you may just want to give a number which should be taken as the new value. This is effectively providing a position relative to the start of the file. Two other possibilities are that you may want to give a number relative to the current file offset value, or that you wish to give a number relative to the end of the file. Each of these three possibilities is shown in Figure 1, along with the symbolic constants used to select them.

The offset parameter is a relative value which is added to the selected base position to give the new file offset value. The base parameter can have one of three values, defined in <unistd.h>:

SEEK_SETcount offset from start of file;
SEEK_CURcount offset from current file offset value;
SEEK_ENDcount offset from end-of-file.

The data type of the offset parameter and the return value of iseekO itself are given as off_t. This doesn't really give any indication of what sort of values would be valid here, though the typedef for off_t in <sys/types.h> gives it as:

	typedef long off_t;

A few simple examples should help to clarify the use of lseek():

	lseek(fd, (off_t) 0, SEEK_SET);
	lseek(fd, (off_t) -50, SEEK_CUR); 
	lseek(fd, (off_t) 5000, SEEK_END);

The first thing to notice is that in order to be pedantically correct, it is necessary to cast the offset parameters to type off_t. This allows for the possibility that the offset parameter may have different underlying data types on different ports of Linux, now or in the future.

The first example lseek() call sets the file offset value in the open file description associated with file descriptor fd to the sum of the offset parameter value (0) and the file offset value of the start of the file, as specified by SEEK_SET (also 0). So this call sets the file offset value to 0 - the start of the file.

The second example moves the file offset value backwards through the file by 50 bytes from its current position (SEEK_CUR). Don't forget that writing to the file will automatically advance the file offset value ready for the next read or write operation. Therefore, if you want to work backwards through a file one byte at a time, writing new byte values as you go, you will need an lseek() offset value of -2 and not - 1, as you might at first expect.

The third example moves the file offset value forwards by 5000 bytes from the end-of-file position (SEEK_END). To move the file offset value beyond the end-of-file like this may seem very strange at first, but this is precisely what is happening here. If you were now to write data at this new position then the file size would become the file offset value of the new end-of-file. What happens to the bytes in the gap between the old end-of-file position and the new one? I'm glad you asked that! In fact, nothing happens to them - they are not even allocated any data blocks on the disk (not even when the data really gets written to the disk, eventually). What you then end up with is a file with gaps in it, whose logical size (reported by ls -l) is the file offset value of the file's end-of-file position, while the physical size (the number of data blocks it occupies on disk) may be significantly less.

It is easy to convince yourself that this is possible; you just write a program to create a new file and then use lseek() to move the file offset value a long way into the file. To be really convincing try a value which is bigger than the size of the filesystem in which the file will be stored. Then write() a byte and close() the file. Now use the ls -l command to see that the size of the file you have just created is larger than could possibly have been written in the filesystem where it appears.

If at some later time you open the file again and seek and write to a position somewhere in the gap, then data blocks will be allocated to the file as required to hold the new data. This will increase the physical size of the file but not its logical size. If at any time you attempt to read data from a gap in the file then Linux will not allocate data blocks for the positions read but will just return zero-byte values for those byte positions instead.

The only restriction placed on lseek() when it moves the file offset value, is that it cannot be moved to before the beginning of the file (i.e. the final file offset value cannot be negative).

The return value from lseek() is -1 on error with an error number in errno or the new value of the file offset position (given in bytes from the start of the file). For example:

	pos = lseek(fd, (off~t) 0, SEEK_CUR); 

would set pos to the current file offset value, and:

	pos = lseek(fd, (off_t) 0, SEEK_END);

would set pos to the length of the file (though there are other ways to do this which don't change the current file offset value).


NEXT UP previous
Next: Terminal I/O