Lecture 17

Address Space and Superblock Operations

Three Programming Models
Standalone, vs. Library, vs. Framework
Linux VFS vs. BeOS VFS
The BeOS VFS provides what can be looked on as a library implementation. The vnode layer of the kernel calls library functions (namely, the virtual functions provided at the lower layer by the file system implementations).

Linux provides more of a framework approach to implementing the VFS layer. Most of the file operations are generic, that is the file system doesn't implement them. Consider the file_operations structure created by Ext3:

struct file_operations ext3_file_operations = {
        llseek:         generic_file_llseek,    /* BKL held */
        read:           generic_file_read,      /* BKL not held.  Don't need */
        write:          ext3_file_write,        /* BKL not held.  Don't need */
        ioctl:          ext3_ioctl,             /* BKL held */
        mmap:           generic_file_mmap,
        open:           ext3_open_file,         /* BKL not held.  Don't need */
        release:        ext3_release_file,      /* BKL not held.  Don't need */
        fsync:          ext3_sync_file,         /* BKL held */
};

You may wonder how it is possible for the read operation on a specific file system to be a generic operation.

That is because read and write are part of the VFS framework that Linux provides. These functions end up using address space operations that read and write data one page at a time to a real device.

address space operations
The address space operations provide the page mapping functionality of the kernel.

struct address_space_operations {
        int (*writepage)(struct page *);
        int (*readpage)(struct file *, struct page *);
        int (*sync_page)(struct page *);
        /*
         * ext3 requires that a successful prepare_write() call be followed
         * by a commit_write() call - they must be balanced
         */
        int (*prepare_write)(struct file *, struct page *, unsigned, unsigned);
        int (*commit_write)(struct file *, struct page *, unsigned, unsigned);
        /* Unfortunately this kludge is needed for FIBMAP. Don't use it */
        int (*bmap)(struct address_space *, long);
        int (*flushpage) (struct page *, unsigned long);
        int (*releasepage) (struct page *, int);
#define KERNEL_HAS_O_DIRECT /* this is for modules out of the kernel */
        int (*direct_IO)(int, struct inode *, struct kiobuf *, unsigned long, int);
};
Ext3 provides these functions for page mapping:
struct address_space_operations ext3_aops = {
        readpage:       ext3_readpage,          /* BKL not held.  Don't need */
        writepage:      ext3_writepage,         /* BKL not held.  We take it */
        sync_page:      block_sync_page,
        prepare_write:  ext3_prepare_write,     /* BKL not held.  We take it */
        commit_write:   ext3_commit_write,      /* BKL not held.  We take it */
        bmap:           ext3_bmap,              /* BKL held */
        flushpage:      ext3_flushpage,         /* BKL not held.  Don't need */
        releasepage:    ext3_releasepage,       /* BKL not held.  Don't need */
};
You might think that the complexity is over at this point, however, you'd be wrong. If we look at the implementation of ext3_readpage, we see one more level of framework being used. The function block_read_full_page is written in a generic way. Its second argument is a function that calls its second argument to get a block handle for the block being mapped (by position).
static int ext3_readpage(struct file *file, struct page *page)
{
        return block_read_full_page(page,ext3_get_block);
}
static int ext3_get_block(struct inode *inode, long iblock,
                        struct buffer_head *bh_result, int create)
super_operations
Ext2 defines the following operations in the super_operations object: read_inode: ext2_read_inode, write_inode: ext2_write_inode, put_inode: ext2_put_inode, delete_inode: ext2_delete_inode, put_super: ext2_put_super, write_super: ext2_write_super, statfs: ext2_statfs, remount_fs: ext2_remount, Ext2 leaves the following super_operations undefined:
        void (*dirty_inode) (struct inode *inode);
        // mark the inode dirty

        void (*write_super_lockfs) (struct super_block *super);
        // locks journal updates allowing any pending updates
        // to finish, and flushes journal

        void (*unlockfs) (struct super_block *super);
        // commits the super_block and unlocks journal updates

        void (*clear_inode) (struct inode *inode);
        // Generic clear_inode is called by file system to notify the
        // VFS that the inode is no longer useful.
        // This file system specific function is called if defined.

        void (*umount_begin) (struct super_block *super);
        // Used to kill those processes that may depend on a mounted
        // file system.

        struct dentry * (*fh_to_dentry)(struct super_block *sb, __u32 *fh, int len, int fhtype, int parent);
        int (*dentry_to_fh)(struct dentry *, __u32 *fh, int *lenp, int need_parent);
        // used by FAT and ReiserFS to associate filehandles and dentry objects

        int (*show_options)(struct seq_file *, struct vfsmount *);
        // causes mount options to be printed to the specified file

This means that any uses of these operations must be carried out in a generic way if the VFS doesn't define them (and the fields are NULL in the super_operations structure). These fields are defined in ReiserFS and in Ext3, so they appear to be relatively new additions to the kernel.

In looking at the kernel code, I ran across numerous occurrences of the lock identifier BKL. Its meaning was unexplained.

inode_operations
struct inode_operations ext2_dir_inode_operations = { create: ext2_create, lookup: ext2_lookup, link: ext2_link, unlink: ext2_unlink, symlink: ext2_symlink, mkdir: ext2_mkdir, rmdir: ext2_rmdir, mknod: ext2_mknod, rename: ext2_rename, }; int (*readlink) (struct dentry *, char *,int); int (*follow_link) (struct dentry *, struct nameidata *); void (*truncate) (struct inode *); int (*permission) (struct inode *, int); int (*revalidate) (struct dentry *); int (*setattr) (struct dentry *, struct iattr *); int (*getattr) (struct dentry *, struct iattr *); 1409 typedef int (get_block_t)(struct inode*,long,struct buffer_head*,int); 1410 e