Next: Minix

System V

The layout of a disk partition configured as a standard UNIX System V file-system is very simple and is illustrated in Figure 1.

As you can see, each disk partition is split into just four parts:

boot block: Contains the code to boot the machine if this is the boot partition; otherwise this block is not used (though it is still allocated).
super block: This block holds administrative information about the filesystem contained on this partition. We will look at this in more detail later.
mode blocks: A set of disk blocks allocated to hold all the modes for this filesystem. Once the filesystem has been created on the partition the number of disk blocks allocated to modes (and hence the number of modes available on this filesystem) is fixed and cannot be changed.
file data blocks: The rest of the blocks on the partition are used to hold the actual data stored in the files on this filesystem. The blocks are allocated to files dynamically as they are required.

In the case of modes and data blocks, some method is required to show when they are free to be allocated, or when they are already allocated and in use. For the System V filesystem a list of free modes and data blocks is kept in the super block to make allocation faster when they are required, though the way in which they are held is different for the two things.

File Data Block Free List

It is generally not possible, by a simple inspection of a data block's contents, to determine whether or not the block is in use. This is because all of a data block's bytes are used to hold file contents, and a file's contents can be any sequence of bytes at all. This means that it is not possible to store a particular sequence of bytes in a data block to indicate that the block is free for allocation. The necessary solution to this problem is to have some data structure which lists those data blocks that are free.

This is actually implemented as a form of linked list, the first part of which is stored in the filesystem's super block. Data blocks on the disk partition are numbered sequentially and are thus each given a unique block number. When the filesystem is first created, all of the data blocks are obviously free and should, therefore, all appear on the free list. However, the space in the super block for this list is limited and only a small part of the list will fit. The next part of the list is stored in one of the free data blocks, the number of which is arranged to be the last block number in the super block list. This idea is repeated, using free blocks to store lists of free block numbers, and making sure that the last free block number in each list is a pointer to the next list of free block numbers. This idea is shown in Figure 2.

Using this arrangement, when a request is made for a new data block, the next free one on the super block list will be allocated (blocks will be allocated right to left in the diagram). As more data blocks are requested, this will be repeated until the super block list has only one entry left in it. Don't forget that this last block in the super block list contains the block numbers of the next part of the free block list. Therefore, these block numbers need to be copied into the super block list before the block which contained them is allocated, which it can be once the copy has taken place.

Figure 3 shows the situation after all the initial list of free block numbers in the super block have been allocated and the contents of block number 200 have been copied into the super block, just before block 200 was allocated as free.

When files are deleted and data blocks become free, their block numbers are just added to the free list in the super block. When this list becomes full and another data block becomes free (say, 188), then there is no space left to store the number. In this situation what happens is that the portion of the data block free list which is in the super block is copied into the newly freed block (188) and then the super block list is emptied and the newly freed block (188) is made the leftmost (and only) entry in the super block list.

Inode Free List

The case of inode allocation is different to the allocation of disk data blocks for several reasons and, consequently, the free inode information is stored in a different format to that of free data blocks. The main reasons for the difference are:

An inode is not very large and the structure contains no sensible place to store a large list of free inode numbers, unlike free data blocks, where the whole block could be used if necessary.
With a data block, there is no way to determine from just a simple inspection of its contents whether or not it is free. This means that all the free data block numbers need to be stored; otherwise they are lost. In the case of an inode, when it is free it is flagged internally as such, which means that free inodes can always be found just by a simple linear search through the inode array.
In general, there are many data blocks in a file so that data blocks tend to be allocated at a much greater rate than inodes. This means that occasional inefficiencies in finding inodes are more tolerable than inefficiencies in finding data blocks.

Taking these points into account, the kernel could just do a linear search through the inode array to find a free inode every time a request for a new inode was made. While this strategy could be made to work, it is made much more efficient by having a set of free inode numbers available in the super block. Unlike the case of free data blocks, however, the free inode list in the super block is just a stand alone list, so that there can be many inodes that are free and yet do not appear in the list. The use of the super block inode free list is illustrated in Figure 4, parts (A) to (F) as follows:

(A): This is meant to show the initial state of the super block inode free list. The index shows the inode number that will be used when the next allocation request is made. The inode numbers which appear in the list were discovered on a linear search of the inode array. During the search, the super block list is filled from right to left (in the diagram) so that the leftmost inode number in the list is the largest free inode so far discovered.
(B): Here two inodes have been allocated (I-40 and I-41) and the index has been moved accordingly.
(C): After several more inode numbers have been allocated, the super block free list eventually gets to its last entry. When this inode is allocated, a new search is made through the inode array to find more free inode numbers with which to refill the super block list. However, there is no point in searching the inode array from the beginning because we already know that all the inodes up to 100 (in our example) are already in use. This suggests that the leftmost inode number in the original free inode list should be used as the next start point in the search for more free inodes.
(D): The inode free list has been refilled again from right to left, so that the leftmost inode number will be the last one to allocate and will once again mark the next start point when more free inodes are required.
(E): This scheme works just fine for as long as inodes are being allocated and none are being released. Suppose, now, that inode 72 is released when the file to which it belongs is deleted. With the free inode list in the super block full, there is nowhere to store the newly freed inode number. This is a problem, as we can't just let inode 72 go free. If we do, it will not be found when we search from inode 182 the next time the super block list needs refilling. In order to make sure that we don't lose freed inodes in this way, the next inode number in the super block list is replaced by the newly freed lower inode number (inode 72 in the example). This technique ensures that the leftmost inode number in the super block free list is always the correct place to start the next inode array search when looking for more free inode numbers.
(F): No special arrangements need to be made to remember the numbers of inodes that become free when the super block list is already full, if their inode numbers are greater than the next inode number, because they will be automatically found by a linear search starting at next. If an inode is released when there is free space in the super block it can just be added to the super block list.

Inode Data Block Pointers

Now that a mechanism exists for allocating a new inode to a file and for finding disk data blocks to add to the file, all that is now required is some method for accessing the data blocks in a file given its inode. The most obvious mechanism for this is to have the data block numbers listed in the inode. Sadly, for any reasonably large file, this solution is impracticable due to the amount of space it would require. within the iode.

Looking at the average sizes of files created by typical users shows that the majority are 1024 bytes or less in size and that, as the size of files increases beyond this, the number of files of each size gets less. This suggests that an arrangement which makes accesses to small files quick and efficient, even at the expense of some speed for larger files, would be an acceptable compromise. It is this type of arrangement which System V implements.

Inside a System V mode there is an array of 13 elements, each of which can hold the number of a disk data block (see Figure 5). The first ten elements of the array are used to hold the data block numbers of the first ten data blocks assigned to storing the file's contents.

Assuming, for the purposes of illustration, that each disk data block is 1024 bytes in size, then these ten data block pointers will allow files to be created that are up to 10 Kb in size. As you can see, for the large majority of files it should be possible to access the data with nothing more than a direct lookup required to find the data block that contains any particular data byte.

With this scheme, once a file has grown to 10 Kb, there are only three block pointers in the inode left to use, whatever the eventual size of the file. Obviously, some new arrangement must be found so that the three remaining block pointers will suffice for any realistic file size, while at the same time not degrading the data access time too much.

This goal is achieved by using the idea of indirect block pointers. Specifically, when an 11th data block needs to be allocated to the file, the 11th inode block pointer is used, but instead of pointing to the block which will contain the data, the 11th pointer is a single indirect pointer which points to a data block filled with a list of direct block pointers. In our example, if we assume that a data block number is a 32-bit value, then a list of 256 of them will fit into the single indirect block. This list will point directly to the data blocks for the next 256 Kb of our file. This means that with 11 block pointers in the inode, files of up to 266 Kb (10 + 256) can be created. True, it takes a little longer to access the data beyond the first 10 Kb in the file, but it takes only one extra disk block read to find the position on the disk of the required data.

For files bigger than 266 Kb the double indirect (12th) inode block pointer is used. This is the same idea as the previous inode pointer except that the double indirect pointer points to a list of pointers in a data block, each of which is itself a single indirect block pointer which points to a list of 256 direct block pointers. This means that the 12th inode block pointer gives access to the next 65536 Kb (256x256) of data in our file.

By now, you should be able to spot the pattern and see that when the file grows bigger than 64 Mb (actually 65802 Kb), the inode's 13th data block pointer will be used, but this time as a triple indirect pointer, which will give access to a staggering 16 Gb (256x256x256 Kb) of extra file space. A single file bigger than 16Gb sounds huge. However, even though the calculation we have just done suggests that this file size is possible with the inode layout as given, in fact there are other factors which limit the maximum size of a file to a smaller value than this. For example, the size of a file, in bytes, is stored separately in its inode in a field of type unsigned long. This is a 32-bit number which limits the size of a file to 4 Gb, so that 13 data block pointers in an inode really are enough.

Next: Minix