Linux provides more of a framework approach to implementing the VFS layer. Most of the file operations are generic, that is the file system doesn't implement them. Consider the file_operations structure created by Ext3:
struct file_operations ext3_file_operations = {
llseek: generic_file_llseek, /* BKL held */
read: generic_file_read, /* BKL not held. Don't need */
write: ext3_file_write, /* BKL not held. Don't need */
ioctl: ext3_ioctl, /* BKL held */
mmap: generic_file_mmap,
open: ext3_open_file, /* BKL not held. Don't need */
release: ext3_release_file, /* BKL not held. Don't need */
fsync: ext3_sync_file, /* BKL held */
};
You may wonder how it is possible for the read operation
on a specific file system to be a generic operation.
That is because read and write are part of
the VFS framework that Linux provides. These functions end up
using address space operations that read and write data one
page at a time to a real device.
struct address_space_operations {
int (*writepage)(struct page *);
int (*readpage)(struct file *, struct page *);
int (*sync_page)(struct page *);
/*
* ext3 requires that a successful prepare_write() call be followed
* by a commit_write() call - they must be balanced
*/
int (*prepare_write)(struct file *, struct page *, unsigned, unsigned);
int (*commit_write)(struct file *, struct page *, unsigned, unsigned);
/* Unfortunately this kludge is needed for FIBMAP. Don't use it */
int (*bmap)(struct address_space *, long);
int (*flushpage) (struct page *, unsigned long);
int (*releasepage) (struct page *, int);
#define KERNEL_HAS_O_DIRECT /* this is for modules out of the kernel */
int (*direct_IO)(int, struct inode *, struct kiobuf *, unsigned long, int);
};
Ext3 provides these functions for page mapping:
struct address_space_operations ext3_aops = {
readpage: ext3_readpage, /* BKL not held. Don't need */
writepage: ext3_writepage, /* BKL not held. We take it */
sync_page: block_sync_page,
prepare_write: ext3_prepare_write, /* BKL not held. We take it */
commit_write: ext3_commit_write, /* BKL not held. We take it */
bmap: ext3_bmap, /* BKL held */
flushpage: ext3_flushpage, /* BKL not held. Don't need */
releasepage: ext3_releasepage, /* BKL not held. Don't need */
};
You might think that the complexity is over at this point, however,
you'd be wrong. If we look at the implementation of ext3_readpage,
we see one more level of framework being used. The function
block_read_full_page is written in a generic way.
Its second argument is a function that calls its second argument
to get a block handle for the block being mapped (by position).
static int ext3_readpage(struct file *file, struct page *page)
{
return block_read_full_page(page,ext3_get_block);
}
static int ext3_get_block(struct inode *inode, long iblock,
struct buffer_head *bh_result, int create)
void (*read_inode) (struct inode *inode);Read the inode data on the disk and store its information into the inode structure argument. The unique identifying number for this inode must be present in the inode->i_ino before making the call.
Fields not set by read_inode are the following: i_hash i_list i_dentry i_dirty_buffers i_dirty_data_buffers ?
void (*read_inode2) (struct inode *inode, void *p) ;
read_inode2 is a kludge that supports ReiserFS.
ReiserFS requires more information than can be placed in the
inode->i_ino field.
void (*write_inode) (struct inode *inode, int do_sync);Requests that the disk inode be modified to contain the contents of the inode object passed as its argument. The second argument if nonzero, requests the disk be synchronized upon writing, that is, to wait until the disk is updated.
void (*put_inode) (struct inode *inode);Notifies the file system that the inode structure can be freed. If no other processes use the object, associated resources can be released.
void (*delete_inode) (struct inode *inode);Deletes all data blocks associated with the specified inode, the inode itself, and any associated file system data.
void (*put_super) (struct super_block *super);Notifies the file system that the super_block argument has can be released because the logical file system has been unmounted.
void (*write_super) (struct super_block *super);Updates the file system to implement the contents of the associated super_block object.
int (*statfs) (struct super_block *super, struct statfs *buf);Fills the statfs structure with current state information from the file system identified by the super_block.
int (*remount_fs) (struct super_block *super, int *flags, char *data);Remounts the file system specified by super_block with the flags specified by the integer argument and options specified in the character data buffer.
void (*dirty_inode) (struct inode *inode);
// mark the inode dirty
void (*write_super_lockfs) (struct super_block *super);
// locks journal updates allowing any pending updates
// to finish, and flushes journal
void (*unlockfs) (struct super_block *super);
// commits the super_block and unlocks journal updates
void (*clear_inode) (struct inode *inode);
// Generic clear_inode is called by file system to notify the
// VFS that the inode is no longer useful.
// This file system specific function is called if defined.
void (*umount_begin) (struct super_block *super);
// Used to kill those processes that may depend on a mounted
// file system.
struct dentry * (*fh_to_dentry)(struct super_block *sb, __u32 *fh, int len, int fhtype, int parent);
int (*dentry_to_fh)(struct dentry *, __u32 *fh, int *lenp, int need_parent);
// used by FAT and ReiserFS to associate filehandles and dentry objects
int (*show_options)(struct seq_file *, struct vfsmount *);
// causes mount options to be printed to the specified file
This means that any uses of these operations must be carried out in
a generic way if the VFS doesn't define them (and the fields are
NULL in the super_operations structure). These fields are
defined in ReiserFS and in Ext3, so they appear to be relatively
new additions to the kernel.
In looking at the kernel code, I ran across numerous occurrences of the lock identifier BKL. Its meaning was unexplained.
int (*create) (struct inode *inode,struct dentry *dentry,int mode);struct dentry * (*lookup) (struct inode *dir,struct dentry *dentry); int (*link) (struct dentry *old_dentry,struct inode *dir,struct dentry *dentry); int (*unlink) (struct inode *dir,struct dentry *dentry); int (*symlink) (struct inode *dir,struct dentry *dentry,const char *symname); int (*mkdir) (struct inode *dir,struct dentry *dentry,int mode); int (*rmdir) (struct inode *dir,struct dentry *dentry); int (*mknod) (struct inode *dir,struct dentry *dentry,int mode,int rdev); int (*rename) (struct inode *old_dir, struct dentry *old_dentry, struct inode *new_dir, struct dentry *new_dentry);