Chapter 5. File I/O: Further Details

In this chapter, we extend the discussion of file I/O that we started in the previous chapter.

In continuing the discussion of the open() system call, we explain the concept of atomicity—the notion that the actions performed by a system call are executed as a single uninterruptible step. This is a necessary requirement for the correct operation of many system calls.

We introduce another file-related system call, the multipurpose fcntl(), and show one of its uses: fetching and setting open file status flags.

Next, we look at the kernel data structures used to represent file descriptors and open files. Understanding the relationship between these structures clarifies some of the subtleties of file I/O discussed in subsequent chapters. Building on this model, we then explain how to duplicate file descriptors.

We then consider some system calls that provide extended read and write functionality. These system calls allow us to perform I/O at a specific location in a file without changing the file offset, and to transfer data to and from multiple buffers in a program.

We briefly introduce the concept of nonblocking I/O, and describe some extensions provided to support I/O on very large files.

Since temporary files are used by many system programs, we’ll also look at some library functions that allow us to create and use temporary files with randomly generated unique names.

Atomicity is a concept that we’ll encounter repeatedly when discussing the operation of system calls. All system calls are executed atomically. By this, we mean that the kernel guarantees that all of the steps in a system call are completed as a single operation, without being interrupted by another process or thread.

Atomicity is essential to the successful completion of some operations. In particular, it allows us to avoid race conditions (sometimes known as race hazards). A race condition is a situation where the result produced by two processes (or threads) operating on shared resources depends in an unexpected way on the relative order in which the processes gain access to the CPU(s).

In the next few pages, we look at two situations involving file I/O where race conditions occur, and show how these conditions are eliminated through the use of open() flags that guarantee the atomicity of the relevant file operations.

We revisit the topic of race conditions when we describe sigsuspend() in Waiting for a Signal Using a Mask: sigsuspend() and fork() in Section 24.4.

In , we noted that specifying O_EXCL in conjunction with O_CREAT causes open() to return an error if the file already exists. This provides a way for a process to ensure that it is the creator of a file. The check on the prior existence of the file and the creation of the file are performed atomically. To see why this is important, consider the code shown in Example 5-1, which we might resort to in the absence of the O_EXCL flag. (In this code, we display the process ID returned by the getpid() system call, which enables us to distinguish the output of two different runs of this program.)

Aside from the long-winded use of two calls to open(), the code in Example 5-1 also contains a bug. Suppose that when our process first called open(), the file did not exist, but by the time of the second open(), some other process had created the file. This could happen if the kernel scheduler decided that the process’s time slice had expired and gave control to another process, as shown in Figure 5-1, or if the two processes were running at the same time on a multiprocessor system. Figure 5-1 portrays the case where two processes are both executing the code shown in Example 5-1. In this scenario, process A would wrongly conclude that it had created the file, since the second open() succeeds whether or not the file exists.

While the chance of the process wrongly believing it was the creator of the file is relatively small, the possibility that it may occur nevertheless renders this code unreliable. The fact that the outcome of these operations depends on the order of scheduling of the two processes means that this is a race condition.

To demonstrate that there is indeed a problem with this code, we could replace the commented line WINDOW FOR FAILURE in Example 5-1 with a piece of code that creates an artificially long delay between the check for file existence and the creation of the file:

printf("[PID %ld] File \"%s\" doesn't exist yet\n", (long) getpid(), argv[1]);
if (argc > 2) {                 /* Delay between check and create */
    sleep(5);                   /* Suspend execution for 5 seconds */
    printf("[PID %ld] Done sleeping\n", (long) getpid());
}

Note

The sleep() library function suspends the execution of a process for a specified number of seconds. We discuss this function in Section 23.4.

If we run two simultaneous instances of the program in Example 5-1, we see that they both claim to have exclusively created the file:

$ ./bad_exclusive_open tfile sleep &
[PID 3317] File "tfile" doesn't exist yet
[1] 3317
$ ./bad_exclusive_open tfile
[PID 3318] File "tfile" doesn't exist yet
[PID 3318] Created file "tfile" exclusively
$ [PID 3317] Done sleeping
[PID 3317] Created file "tfile" exclusively                 Not true

Note

In the penultimate line of the above output, we see the shell prompt mixed with output from the first instance of the test program.

Both processes claim to have created the file because the code of the first process was interrupted between the existence check and the creation of the file. Using a single open() call that specifies the O_CREAT and O_EXCL flags prevents this possibility by guaranteeing that the check and creation steps are carried out as a single atomic (i.e., uninterruptible) operation.