Typically a few of the data blocks of a file (usually about 4 to 16) are stored in the file's i-node. In addition the i-node will point to an indirect block (containing further block addresses for the file), a doubly-indirect block (containing addresses of further indirect blocks), and perhaps a triply-indirect block (containing addresses of further doubly-indirect blocks).
The direct block pointers are distinct and separate from the indirect, doubly-indirect, and triply-indirect block pointers. The each contain a value. Each indirect block direct block addresses only. Each doubly-indirect block contains indirect block addresses. Each triply-indirect block contains doubly-indirect block addresses.
Giampaolo gives the following code that computes the address of data found within the doubly-indirect blocks of a file system. It is not the generic code to find the address of an arbitary byte in a file: it only shows how doubly-indirect range addresses can be resolved into the following information:
blksize = size of the file system blocks
dsize = amount of file data mapped by direct blocks
indsize = amount of file data mapped by an indirect block
if (filepos >= dsize + indsize)) { /* double-indirect blocks */
filepos -= (dsize + indsize);
dbl_indirect_index = filepos / indsize;
if (filepos >= indsize) { /* indirect blocks */
filepos -+ (dbl_indirect_index * indsize);
indirect_index = filepos / blksize;
}
filepos -= (indirect_index * blksize); /* offset in data block */
block_offset = filepos;
}
An extent is a sequence of contiguous blocks on a disk. A single number can be used to identify a block, but a pair of numbers (starting block and run length) is required to represent an extent. In a system with 64 bit block addresses, one can represent an extent by allowing some number of bits (say 48) to represent the block address and the rest (16) to represent the number of blocks in the extent. Such an arrangement would allow 258 288,230,376,151,711,744 bytes to be addressed in a system using 1024 byte blocks.
Even file systems using extents will require indirect and possibly doubly-indirect blocks to be used. And extents hurt performance because they make it more difficult to locate the position of a block in the indirect and doubly-indirect maps. The run-lengths must be added together to find any particular address in the file. This can be overcome by fixing the size of the extents.
Directories (or folders) support the rational organization of the files contained within file systems by allowing them to be grouped together. A directory is a map from file names into i-nodes. The name is used as the key for searches for a file.
The directory's representation is the data associated with a directory's i-node. Any data structure supporting a name/i-node mapping can be used to store directory information. Choices might include:
One must consider the character set to be used for naming files
Inherent in the idea of allowing directories to contain i-nodes and using i-nodes to represent directories is the idea of a completely hierarchical file system.
The directory structure evident in the file system is just one way to view a file system. Other hierarchies might be imposed upon the files contained therein.
By considering the fundamental operations we can carry out using file systems, we can see the inherent issues associated with their implementation.
We must be able to turn a newly allocated region of a disk into a file system or part of a file system. In any file system there must be at least one fixed location structure. In FFS there are many fixed location structures. These must be laid out in order to allow one to manipulate files. We must also create an empty root (top-level) directory within the file system.
Mounting is the act of making a physical file system representation on some medium available for use by the operating system and programs it may run. To do this, we must read the file system metadata from the medium.
Some times the system to be mounted may be inconsistent due to physical damage or due to premature termination of file system processes when it was last mounted. Most file systems nowadays will make a record of whether or not they are clean, that is whether or not proper shutdown occurred and the state of the file system can be considered to be consistent.
If a file system is known to be inconsistent, it must be checked (at least) and repaired (if necessary and possible). It is possible that a file system not marked as consistent will in fact be consistent, but this is not the norm.
Journalling can allow us to more easily repair any inconsistencies that may occur.
Even file systems marked as consistent may not be. The amount of work needed to completely verify the consistency of a file system, however, can be significant. The trade-off a file system makes between how scrupulously to verify consistency and how quickly to start up is an important engineering decision.
After verifying consistency of the file system, file system metadata will be read and stored in active memory. These tables cache the file system metadata so that it is more easily accessible.
The primary task achieved by unmounting is to flush the cached metadata so that the representation on disk will be consistent and to mark the system as being clean. Once it is marked clean, the file system representation should only be accessed after a succeeding mount.
To create a file on a Unix system, you need the following information:
The data blocks of the file can be created at any time once the i-node has been allocated.