A solution is provided in the file fileio/atomic_append.c
in the source code distribution for this book. Here is an example of the results that we see when we run this program as suggested:
$ ls -l f1 f2
-rw------- 1 mtk users 2000000 Jan 9 11:14 f1
-rw------- 1 mtk users 1999962 Jan 9 11:14 f2
Because the combination of lseek() plus write() is not atomic, one instance of the program sometimes overwrote bytes written by the other instance. As a result, the file f2
contains less than 2 million bytes.
A call to dup() can be rewritten as:
fd = fcntl(oldfd, F_DUPFD, 0);
A call to dup2() can be rewritten as:
if (oldfd == newfd) { /* oldfd == newfd is a special case */ if (fcntl(oldfd, F_GETFL) == -1) { /* Is oldfd valid? */ errno = EBADF; fd = -1; } else { fd = oldfd; } } else { close(newfd); fd = fcntl(oldfd, F_DUPFD, newfd); }
The first point to realize is that, since fd2 is a duplicate of fd1, they both share a single open file description, and thus a single file offset. However, because fd3 is created via a separate call to open(), it has a separate file offset.
After the first write(), the file contents are Hello
,.
Since fd2 shares a file offset with fd1, the second write() call appends to the existing text, yielding Hello, world
.
The lseek() call adjusts the single file offset shared by fd1 and fd2 to point to the start of the file, and thus the third write() call overwrites part of the existing text to yield HELLO, world
.
The file offset for fd3 has not so far been modified, and so points to the start of the file. Therefore, the final write() call changes the file contents to Gidday world
.
Run the program fileio/multi_descriptors.c
in the source code distribution for this book to see these results.
Since the array mbuf is not initialized, it is part of the uninitialized data segment. Therefore, no disk space is required to hold this variable. Instead, it is allocated (and initialized to 0) when the program is loaded.
A demonstration of the incorrect usage of longjmp() is provided in the file proc/bad_longjmp.c
in the source code distribution for this book.
Sample implementations of setenv() and unsetenv() are provided in the file proc/setenv.c
in the source code distribution for this book.
The two getpwnam() calls are executed before the printf() output string is constructed, and—since getpwnam() returns its result in a statically allocated buffer—the second call overwrites the result returned by the first call.
In considering the following, remember that changes to the effective user ID always also change the file-system user ID.
real=2000, effective=2000, saved=2000, file-system=2000
real=1000, effective=2000, saved=2000, file-system=2000
real=1000, effective=2000, saved=0, file-system=2000
real=1000, effective=0, saved=0, file-system=2000
real=1000, effective=2000, saved=3000, file-system=2000
Strictly speaking, such a process is unprivileged, since its effective user ID is nonzero. However, an unprivileged process can use the setuid(), setreuid(), seteuid(), or setresuid() calls to set its effective user ID to the same value as its real user ID or saved set-user-ID. Thus, this process could use one of these calls to regain privilege.
The following code shows the steps for each system call.
e = geteuid(); /* Save initial value of effective user ID */ setuid(getuid()); /* Suspend privileges */ setuid(e); /* Resume privileges */ /* Can't permanently drop the set-user-ID identity with setuid() */ seteuid(getuid()); /* Suspend privileges */ seteuid(e); /* Resume privileges */ /* Can't permanently drop the set-user-ID identity with seteuid() */ setreuid(-1, getuid()); /* Temporarily drop privileges */ setreuid(-1, e); /* Resume privileges */ setreuid(getuid(), getuid()); /* Permanently drop privileges */ setresuid(-1, getuid(), -1); /* Temporarily drop privileges */ setresuid(-1, e, -1); /* Resume privileges */ setresuid(getuid(), getuid(), getuid()); /* Permanently drop privileges */
With the exception of setuid(), the answers are the same as for the previous exercise, except that we can substitute the value 0 for the variable e. For setuid(), the following holds:
/* (a) Can't suspend and resume privileges with setuid() */ setuid(getuid()); /* (b) Permanently drop privileges */
The maximum unsigned 32-bit integer value is 4,294,967,295. Divided by 100 clock ticks per second, this corresponds to slightly more than 497 days. Divided by 1 million (CLOCKS_PER_SEC
), this corresponds to 71 minutes and 35 seconds.
A solution is provided in the file sysinfo/procfs_user_exe.c
in the source code distribution for this book.
This sequence of statements ensures that the data written to a stdio buffer is flushed to the disk. The fflush() call flushes the stdio buffer for fp to the kernel buffer cache. The argument given to the subsequent fsync() is the file descriptor underlying fp; thus, the call flushes the (recently filled) kernel buffer for that file descriptor to disk.
When standard output is sent to a terminal, it is line-buffered, so that the output of the printf() call appears immediately, and is followed by the output of write(). When standard output is sent to a disk file, it is block-buffered. Consequently, the output of the printf() is held in a stdio buffer and is flushed only when the program exits (i.e., after the write() call). (A complete program containing the code of this exercise is available in the file filebuff/mix23_linebuff.c
in the source code distribution for this book.)
The stat() system call doesn’t change any file timestamps, since all it does is fetch information from the file i-node (and there is no last i-node access timestamp).
The GNU C library provides just such a function, named euidaccess(), in the library source file sysdeps/posix/euidaccess.c
.
In order to do this, we must use two calls to umask(), as follows:
mode_t currUmask; currUmask = umask(0); /* Retrieve current umask, set umask to 0 */ umask(currUmask); /* Restore umask to previous value */
Note, however, that this solution is not thread-safe, since threads share the process umask setting.
A solution is provided in the file files/chiflag.c
in the source code distribution for this book.
Using ls -li shows that the executable file has different i-node numbers after each compilation. What happens is that the compiler removes (unlinks) any existing file with the same name as its target executable, and then creates a new file with the same name. It is permissible to unlink an executable file. The name is removed immediately, but the file itself remains in existence until the process executing it terminates.
The file myfile
is created in the subdirectory test
. The symlink() call creates a relative link in the parent directory. Despite appearances, this is a dangling link, since it is interpreted relative to the location of the link file, and thus refers to a nonexistent file in the parent directory. Consequently, chmod() fails with the error ENOENT
(“No such file or directory”). (A complete program containing the code of this exercise is available in the file dirs_links/bad_symlink.c
in the source code distribution for this book.)
A solution is provided in the file dirs_links/list_files_readdir_r.c
, which can be found in the source code distribution for this book.
A solution is provided in the file dirs_links/file_type_stats.c
, which can be found in the source code distribution for this book.
Using fchdir() is more efficient. If we are performing the operation repeatedly within a loop, then with fchdir() we can perform one call to open() before executing the loop, and with chdir() we can place the getcwd() call outside the loop. Then we are measuring the difference between repeated calls to fchdir(fd) and chdir(buf). Calls to chdir() are more expensive for two reasons: passing the buf argument to the kernel requires a larger data transfer between user space and kernel space, and the pathname in buf must be resolved to the corresponding directory i-node on each call. (Kernel caching of directory entry information reduces the work required for the second point, but some work must still be done.)
As with most UNIX implementations, Linux delivers standard signals before realtime signals (SUSv3 doesn’t require this). This makes sense, because some standard signals indicate critical conditions (e.g., hardware exceptions) that a program should deal with as soon as possible.
Replacing sigsuspend() plus a signal handler with sigwaitinfo() in this program provides a 25 to 40 percent speed improvement. (The exact figure varies somewhat across kernel versions.)
A modified program using clock_nanosleep() is provided in the file timers/t_clock_nanosleep.c
in the source code distribution for this book.
A solution is provided in the file timers/ptmr_null_evp.c
in the source code distribution for this book.
The first fork() call creates one new child. Both parent and child carry on to execute the second fork(), and thus each creates a further process, making a total of four processes. All four processes carry on to execute the next fork(), each creating a further child. Consequently, a total of seven new processes are created.
A solution is provided in the file procexec/vfork_fd_test.c
in the source code distribution for this book.
If we call fork(), and then have the child call raise() to send itself a signal such as SIGABRT
, this will yield a core dump file that closely mirrors the state of the parent at the time of the fork(). The gdb gcore command allows us to perform a similar task for a program, without needing to change the source code.
Add a converse kill() call in the parent:
if (kill(childPid, SIGUSR1) == -1) errExit("kill")
And in the child, add a converse sigsuspend() call:
sigsuspend(&origMask); /* Unblock SIGUSR1, wait for signal */
Assuming a two’s complement architecture, where -1 is represented by a bit pattern with all bits on, then the parent will see an exit status of 255 (all bits on in the least significant 8 bits, which are all that is passed back to the parent when it calls wait()). (The presence of the call exit(-1) in a program is usually a programmer error resulting from confusion with the -1 return used to indicate failure of a system call.)
A solution is provided in the file procexec/orphan.c
in the source code distribution for this book.
The execvp() function first fails to exec the file xyz
in dir1
, because execute permission is denied. It therefore continues its search in dir2
, where it successfully execs xyz
.
A solution is provided in the file procexec/execlp.c
in the source code distribution of this book.
The script specifies the cat program as its interpreter. The cat program “interprets” a file by printing its contents—in this case with the -n (line numbering) option enabled (as though we had entered the command cat -n ourscript). Thus, we would see the following:
1 #!/bin/cat -n 2 Hello world
Two successive fork() calls yield a total of three processes related as parent, child, and grandchild. Having created the grandchild process, the child immediately exits, and is reaped by the waitpid() call in the parent. As a consequence of being orphaned, the grandchild is adopted by init (process ID of 1). The program doesn’t need to perform a second wait() call, since init automatically reaps the zombie when the grandchild terminates. Therein lies a possible use for this code sequence: if we need to create a child for which we can’t later wait, then this sequence can be used to ensure that no zombie results. One example of such a requirement is where the parent execs some program that is not guaranteed to perform a wait (and we don’t want to rely on setting the disposition of SIGCHLD
to SIG_IGN
, since the disposition of an ignored SIGCHLD
after an exec() is left unspecified by SUSv3).
The string given to printf() doesn’t include a newline character, and therefore the output is not flushed before the execlp() call. The execlp() overwrites the existing program’s data segments (as well as the heap and stack), which contain the stdio buffers, and thus the unflushed output is lost.
SIGCHLD
is delivered to the parent. If the SIGCHLD
handler attempts to do a wait(), then the call returns an error (ECHILD
) indicating that there were no children whose status could be returned. (This assumes that the parent had no other terminated children. If it did, then the wait() would block; or if waitpid() was used with the WNOHANG
flag, waitpid() would return 0.) This is exactly the situation that may arise if a program establishes a handler for SIGCHLD
before calling system().
There are two possible outcomes (both permitted by SUSv3): the thread deadlocks, blocked while trying to join with itself, or the pthread_join() call fails, returning the error EDEADLK
. On Linux, the latter behavior occurs. Given a thread ID in tid, we can prevent such an eventuality using the following code:
if (!pthread_equal(tid, pthread_self())) pthread_join(tid, NULL);
After the main thread terminates, threadFunc() will continue working with storage on the main thread’s stack, with unpredictable results.
A solution is provided in the file threads/one_time_init.c
in the source code distribution for this book.
Suppose that the program is part of a shell pipeline:
$./ourprog | grep '
some string
'
The problem is that grep will be part of the same process group as ourprog, and therefore the killpg() call will also terminate the grep process. This is probably not desired, and is likely to confuse users. The solution is to use setpgid() to ensure that the child processes are placed in their own new group (the process ID of the first child could be used as the process group ID of the group), and then signal that process group. This also removes the need for the parent to make itself immune to the signal.
If the SIGTSTP
signal is unblocked before raising it again, then there is a small window of time (between the calls to sigprocmask() and raise()) during which, if the user types a second suspend character (Control-Z), the process will be stopped while still in the handler. Consequently, two SIGCONT
signals will be required to resume the process.
A solution is provided in the file daemons/t_syslog.c
in the source code distribution for this book.
Whenever a file is modified by an unprivileged user, the kernel clears the file’s set-user-ID permission bit. The set-group-ID permission bit is similarly cleared if the group-execute permission bit is enabled. (As detailed in Section 55.4, the combination of having the set-group-ID bit on while the group-execute bit is off has nothing to do with set-group-ID programs; instead, it is used to enable mandatory locking, and for this reason modifications to such a file don’t disable the set-group-ID bit.) Clearing these bits ensures that if the program file is writable by arbitrary users, then it can’t be modified and still retain its ability to give privileges to users executing the file. A privileged (CAP_FSETID
) process can modify a file without the kernel clearing these permission bits.
A solution is provided in the file pipes/change_case.c
in the source code distribution for this book.
It creates a race condition. Suppose that between the time the server sees end-of-file and the time it closes the file reading descriptor, a client opens the FIFO for writing (this will succeed without blocking), and then writes data to the FIFO after the server has closed the reading descriptor. At this point, the client will receive a SIGPIPE
signal, since no process has the FIFO open for reading. Alternatively, the client might be able to both open the FIFO and write data to it before the server closes the reading descriptor. In this case, the client’s data would be lost, and it wouldn’t receive a response from the server. As a further exercise, you could try to demonstrate these behaviors by making the suggested modification to the server and creating a special-purpose client that repeatedly opens the server’s FIFOs, sends a message to the server, closes the server’s FIFO, and reads the server’s response (if any).
One possible solution would be to set a timer on the open() of the client FIFO using alarm(), as described in Section 23.3. This solution suffers from the drawback that the server would still be delayed for the period of the timeout. Another possible solution would be to open the client FIFO using the O_NONBLOCK
flag. If this fails, then the server can assume a misbehaving client. This latter solution also requires changes to the client, which needs to ensure that it opens its FIFO (also using the O_NONBLOCK
flag) prior to sending a request to the server. For convenience, the client should then turn off the O_NONBLOCK
flag for the FIFO file descriptor, so that the subsequent read() call blocks. Finally, it is possible to implement a concurrent server solution for this application, with the main server process creating a child to send the response message to each client. (This would represent a rather resource-expensive solution in the case of this simple application.)
Other conditions that are not handled by this server remain. For example, it doesn’t handle the possibilities of the sequence number overflowing or a misbehaving client requesting large groups of sequence numbers in order to produce such overflows. The server also does not handle the possibility that the client specifies a negative value for the sequence length. Furthermore, a malicious client could create its reply FIFO, and then open the FIFO for reading and writing, and fill it with data before sending a request to the server; as a consequence, the server would be able to successfully open the reply FIFO, but would block when it tries to write the reply. As a further exercise, you could try to devise strategies for dealing with these possibilities.
In Section 44.8, we also noted another limitation that applies to the server in Example 44-7: if a client sends a message that contains the wrong number of bytes, then the server will be out of step when reading all subsequent client messages. One simple way to deal with this problem is to discard the use of fixed-length messages in favor of the use of a delimiter character.
The value 0 is a valid message queue identifier, but 0 can’t be used as a message type.
A solution is provided in the file svsem/event_flags.c
in the source code distribution for this book.
A reserve operation can be implemented by reading a byte from the FIFO. Conversely, a release operation can be implemented by writing a byte to the FIFO. A conditional reserve operation can be implemented as a nonblocking read of a byte from the FIFO.
Since access to the shmp-> cnt value in the for
loop increment step is no longer protected by the semaphore, there is a race condition between the writer next updating this value and the reader fetching it.
A solution is provided in the file svshm/svshm_mon.c
in the source code distribution for this book.
A solution is provided in the file mmap/mmcopy.c
in the source code distribution for this book.
A solution is provided in the file vmem/madvise_dontneed.c
in the source code distribution for this book.
A solution is provided in the file pmsg/mq_notify_sigwaitinfo.c
in the source code distribution for this book.
It would not be safe to make buffer global. Once message notification is reenabled in threadFunc(), there is a chance that a second notification would be generated while threadFunc() is still executing. This second notification could initiate a second thread that executes threadFunc() at the same time as the first thread. Both threads would attempt to use the same global buffer, with unpredictable results. Note that the behavior here is implementation-dependent. SUSv3 permits an implementation to sequentially deliver notifications to the same thread. However, it is also permissible to deliver notifications in separate threads that execute concurrently, and this is what Linux does.
A solution is provided in the file psem/psem_timedwait.c
in the source code distribution for this book.
The following hold for flock() on Linux:
A series of shared locks can starve a process waiting to place an exclusive lock.
There are no rules regarding which process is granted the lock. Essentially, the lock is granted to the process that is next scheduled. If that process happens to be one that obtains a shared lock, then all other processes requesting shared locks will also be able to have their requests granted simultaneously.
The flock() system call doesn’t detect deadlock. This is true of most flock() implementations, except those that implement flock() in terms of fcntl().
In all except early (1.2 and earlier) Linux kernels, the two types of locking operate independently, and have no affect on one another.
A solution is provided in the files read_line_buf.h
and read_line_buf.c
in the sockets
subdirectory in the source code distribution for this book.
A solution is provided in the files is_seqnum_v2_sv.c
, is_seqnum_v2_cl.c
, and is_seqnum_v2.h
in the sockets
subdirectory in the source code distribution for this book.
A solution is provided in the files unix_sockets.h
, unix_sockets.c
, us_xfr_v2.h
, us_xfr_v2_sv.c
, and us_xfr_v2_cl.c
in the sockets
subdirectory in the source code distribution for this book.
In the Internet domain, datagrams from a nonpeer socket are silently discarded.
A solution is provided in the file sockets/is_echo_v2_sv.c
in the source code distribution for this book.
Since the send and receive buffers for a TCP socket have a limited size, if the client sent a large quantity of data, then it might fill these buffers, at which point a further write() would (permanently) block the client before it read any of the server’s response.
A solution is provided in the file sockets/sendfile.c
in the source code distribution for this book.
The tcgetattr() fails if it is applied to a file descriptor that doesn’t refer to a terminal.
A solution is provided in the file tty/ttyname.c
in the source code distribution for this book.
A solution is provided in the file altio/select_mq.c
in the source code distribution for this book.
A race condition would result. Suppose the following sequence of events: (a) after select() has informed the program that the self-pipe has data, it performs the appropriate actions in response to the signal; (b) another signal arrives, and the handler writes a byte to the self-pipe and returns; and (c) the main program drains the self-pipe. As a consequence, the program misses the signal that was delivered in step (b).
The epoll_wait() call blocks, even when the interest list is empty. This can be useful in a multithreaded program, where one thread might add a descriptor to the epoll interest list, while another thread is blocked in an epoll_wait() call.
Successive epoll_wait() calls cycle through the list of ready file descriptors. This is useful because it helps prevent file-descriptor starvation, which could occur if epoll_wait() always (for example) returned the lowest-numbered ready file descriptor, and that file descriptor always had some input available.
First, the child shell process terminates, followed by the script parent process. Since the terminal is operating in raw mode, the Control-D character is not interpreted by the terminal driver, but is instead passed as a literal character to the script parent process, which writes it to the pseudoterminal master. The pseudoterminal slave is operating in canonical mode, so this Control-D character is treated as an end-of-file, which causes the child shell’s next read() to return 0, with the result that the shell terminates. The termination of the shell closes the only file descriptor referring to the pseudoterminal slave. As a consequence, the next read() by the parent script process fails with the error EIO
(or end-of-file on some other UNIX implementations), and this process then terminates.
A solution is provided in the file pty/unbuffer.c
in the source code distribution for this book.