Lecture 2

Further Introduction to Files and File Systems

Here's my program to address the specifications of question 4 of the pretest:
     1	#include <sys/types.h>
     2	#include <sys/stat.h>
     3	#include <fcntl.h>
     4	#include <unistd.h>
     5	
     6	#define BUFSIZ 4096
     7	
     8	int main()
     9	{
    10	  char buffer[BUFSIZ];
    11	  int numread;
    12	  int infd, outfd;
    13	
    14	  if ((infd = open ("/tmp/original", O_RDONLY)) < 0) {
    15	    perror ("Couldn't open input file");
    16	    exit (-1);
    17	  }
    18	
    19	  if ((outfd = open("/tmp/copy", O_CREAT|O_TRUNC|O_RDWR, 0777)) < 0) {
    20	    perror ("Couldn't create output file");
    21	    exit (-1);
    22	  }
    23	
    24	  while((numread = read (infd, buffer, BUFSIZ)) > 0) {
    25	    if (write (outfd, buffer, numread) != numread) {
    26	      perror("Write failed");
    27	      exit(-1);
    28	    }
    29	  }
    30	
    31	  if (numread != 0) {
    32	    perror("Read failed");
    33	    exit(-1);
    34	  }
    35	
    36	  // The close function wasn't mentioned in class
    37	  if (close(infd) < 0) {
    38	    perror("Couldn't successfully close input file");
    39	    exit(-1);
    40	  }
    41	
    42	  if (close(outfd) < 0) {
    43	    perror("Couldn't successfully close output file");
    44	    exit(-1);
    45	  }
    46	
    47	  return 0;
    48	}



Several observations are about the program are in order:
  1. The line numbers are not part of the program. They're just there so you can more easily see what's going on.

  2. The declaration of BUFSIZ on line 6 is a C preprocessor macro definition. In these modern days of C++, it would be more proper to introduce a const int declaration, such as
    const int BUFSIZ=1024;
    but I included the macro because you will see lots of macro definitions in old C programs.

    Consult your C book for some of the issues surrounding the use of macros.

  3. On line 8, the main function is defined. The distinction between defining and declaring a function or other entity is an important one that does not exist in Java. When you declare an entity, you tell what is needed to know how to use that entity. When you define it, you tell what is needed to know to implement that entity.

  4. The definition of main declares that the return type of main is 0. A Unix program is normally considered to have successfully terminated if it returns an exit status of 0.

  5. In line 10, buffer is defined to be an array of bytes containing BUFSIZ elements. If we are to read the contents of a file, we will need to store them somewhere. This buffer will provide that storage.

    It is clear that there can be files larger than BUFSIZ bytes can exist, thus, if we allocate some fixed size buffer, we'll need to read part of the file into that buffer multiple times, which is exactly what we'll do.

  6. On line 11, numread will be used to keep track of how many bytes we've read into the buffer at any one time.

  7. In line 12, infd and outfd are used to hold the integer file descriptors of the input and output file, respectively. In Unix, as in many other systems, the open files associated with any process are identified by numbering them.

    Certain conventional numbers are used, such as, 0 for the standard input file (stdin, the terminal keyboard), 1 for the standard output file (stdout, the terminal screen), and 2 for the standard error file (stderr, usually mapped to the terminal screen). Using these standard numbers makes it easy to implement interprocess communication with pipes. If your program reads from stdin, then data can be piped into that file from other processes. Likewise data from stdout or stderr can be piped into another process's stdin.

  8. In line 14, we open the original file. The argument O_RDONLY is a constant whose definition is included from the file /usr/include/fcntl.h which was included on line 3. On the system I typically use, the file /usr/include/fcntl.h included the file /usr/include/sys/fcntl.h which actually contained the following code:
    #define O_RDONLY        0
    The constant O_RDONLY is a flag that tells open that we will only be reading from this file. If we were to attempt a write to the file after it was opened in this way, that write would fail.

  9. We assign the result of the open function call to infd and test to see if its value is less than 0. Valid file descriptors are all positive integers. If open returns a value less than 0, an error has occurred.

  10. The perror function (line 15) consults the current error status variable (there is only one per process) and writes an approprate error message to the stderr. The string argument given to perror is written followed by a colon and a situation specific error description.

  11. The call to exit in line 16 indicates that the program is to terminate immediately (after closing open files and doing various other clean-up operations) and tells what status value the program shall return.

  12. The call to open in line 19 tells the file to open (/tmp/copy), what flags to open it using (O_CREAT|O_TRUNC|O_RDWR), and what file modes to AND with the current default file permissions (0777).

    Three flags are presented to this open call. Each represents at most a single bit. They are O_CREAT (0100), O_TRUNC (01000), and O_RDWR (02). When we OR these together, we get the single flag argument value 01102 which open uses to determine its operation.

    Note that in C, any literal constant number that starts with a 0 is presumed to be specified in octal, or base-8, notation. Thus, the value 01102 represents 578 in decimal, or 1001000010 in binary. The binary representation is more appropriate in this case, since each different flag corresponds to a single specific bit position. A one in that position signals open that the corresponding flag is ON. A 0 signals that the corresponding flag is OFF.

    The O_CREAT flag tells open that it needs to create this file if it doesn't aleady exist.

    The O_TRUNC flag tells open to truncate the file to zero length. This insures that if the file already exists and we are copying a shorter file into it, we don't end up with old contents of the file tacked onto the end of its new contents.

    The O_RDWR flag tells open that we may be reading, writing or both. We could have used the O_WRONLY flag here instead.

    The creation mode argument (0777) specifies the access permission to be associated with the created file. This is specified as a 3 digit octal number. The first octal digit has 3 bit positions corresponding to the owner's permissions. The most significant digit is a 1 if the owner has read permission and 0 otherwise. The second digit is a 1 if the owner has write permission and 0 otherwise. The least significant digit is a 1 if the owner has permission to execute this file as a program and 0 otherwise. Since the first octal digit in this value is 7 (111 in binary), that means the owner is permitted to read, write, and execute this file.

    The second octal digit specifies read, write, and execute permissions for users belonging to the group owner of the file. The group ownership of a file is determined by the group owner of the process that created the file.

    The third octal digit specifies read, write, and execute permissions for all other users.

    The file permissions to be associated with the created file are set to be the permission argument bits AND the default permission bits associated with the process calling open. This insures that a process does not create any file with weaker permissions than the process would normally want created.

  13. On line 24, interesting work finally starts to happen. The call to read specifies the file descriptor from which to read (infd), the address of a buffer in which to store what is read (buffer), and the maximum number of bytes to read (BUFSIZ).

    It is critical that the number of bytes to be read does not exceed the amount of space in the buffer. Otherwise, a buffer overflow can occur. Buffer overflows can be exploited in a variety of ways to compromise the security of an operating system.

    If read succeeds in reading any bytes, it will appropriately update the buffer and return the number of bytes read.

    If read returns the value 0, then it has reached the end of file.

    If read returns -1, an error has occurred.

  14. The call to write on line 25 specifies the output file descriptor, the address of the buffer containing the bytes to be written, and the number of bytes to write.

    The result of write is the number of bytes that have been written. In this program, we assume that write must always be able to write all the bytes we specify or the process fails. (This assumption is not always valid.)

  15. In line 31, we check to make sure that the last read returned a value of 0, rather than an error value.

  16. In line 37 we close the input file and in line 38 we close the output file.

  17. If we succeeded in all the above tasks, line 47 returns value 0 for the program's execution.

Terms

(I'll have to fill in more later.)