In many application designs, a parent process needs to know when one of its child processes changes state—when the child terminates or is stopped by a signal. This chapter describes two techniques used to monitor child processes: the wait() system call (and its variants) and the use of the SIGCHLD
signal.
In many applications where a parent creates child processes, it is useful for the parent to be able to monitor the children to find out when and how they terminate. This facility is provided by wait() and a number of related system calls.
The wait() system call waits for one of the children of the calling process to terminate and returns the termination status of that child in the buffer pointed to by status.
#include <sys/wait.h>
pid_t wait
(int *status);
Returns process ID of terminated child, or -1 on error
The wait() system call does the following:
If no (previously unwaited-for) child of the calling process has yet terminated, the call blocks until one of the children terminates. If a child has already terminated by the time of the call, wait() returns immediately.
If status is not NULL
, information about how the child terminated is returned in the integer to which status points. We describe the information returned in status in The Wait Status Value.
The kernel adds the process CPU times (Process Time) and resource usage statistics (Process Resource Usage) to running totals for all children of this parent process.
As its function result, wait() returns the process ID of the child that has terminated.
On error, wait() returns -1. One possible error is that the calling process has no (previously unwaited-for) children, which is indicated by the errno value ECHILD
. This means that we can use the following loop to wait for all children of the calling process to terminate:
while ((childPid = wait(NULL)) != -1) continue; if (errno != ECHILD) /* An unexpected error... */ errExit("wait");
Example 26-1 demonstrates the use of wait(). This program creates multiple child processes, one per (integer) command-line argument. Each child sleeps for the number of seconds specified in the corresponding command-line argument and then exits. In the meantime, after all children have been created, the parent process repeatedly calls wait() to monitor the termination of its children. This loop continues until wait() returns -1. (This is not the only approach: we could alternatively exit the loop when the number of terminated children, numDead, matches the number of children created.) The following shell session log shows what happens when we use the program to create three children:
$ ./multi_wait 7 1 4
[13:41:00] child 1 started with PID 21835, sleeping 7 seconds
[13:41:00] child 2 started with PID 21836, sleeping 1 seconds
[13:41:00] child 3 started with PID 21837, sleeping 4 seconds
[13:41:01] wait() returned child PID 21836 (numDead=1)
[13:41:04] wait() returned child PID 21837 (numDead=2)
[13:41:07] wait() returned child PID 21835 (numDead=3)
No more children - bye!
If there are multiple terminated children at a particular moment, SUSv3 leaves unspecified the order in which these children will be reaped by a sequence of wait() calls; that is, the order depends on the implementation. Even across versions of the Linux kernel, the behavior varies.
Example 26-1. Creating and waiting for multiple children
procexec/multi_wait.c
#include <sys/wait.h> #include <time.h> #include "curr_time.h" /* Declaration of currTime() */ #include "tlpi_hdr.h" int main(int argc, char *argv[]) { int numDead; /* Number of children so far waited for */ pid_t childPid; /* PID of waited for child */ int j; if (argc < 2 || strcmp(argv[1], "--help") == 0) usageErr("%s sleep-time...\n", argv[0]); setbuf(stdout, NULL); /* Disable buffering of stdout */ for (j = 1; j < argc; j++) { /* Create one child for each argument */ switch (fork()) { case -1: errExit("fork"); case 0: /* Child sleeps for a while then exits */ printf("[%s] child %d started with PID %ld, sleeping %s " "seconds\n", currTime("%T"), j, (long) getpid(), argv[j]); sleep(getInt(argv[j], GN_NONNEG, "sleep-time")); _exit(EXIT_SUCCESS); default: /* Parent just continues around loop */ break; } } numDead = 0; for (;;) { /* Parent waits for each child to exit */ childPid = wait(NULL); if (childPid == -1) { if (errno == ECHILD) { printf("No more children - bye!\n"); exit(EXIT_SUCCESS); } else { /* Some other (unexpected) error */ errExit("wait"); } } numDead++; printf("[%s] wait() returned child PID %ld (numDead=%d)\n", currTime("%T"), (long) childPid, numDead); } }procexec/multi_wait.c
The wait() system call has a number of limitations, which waitpid() was designed to address:
If a parent process has created multiple children, it is not possible to wait() for the completion of a specific child; we can only wait for the next child that terminates.
If no child has yet terminated, wait() always blocks. Sometimes, it would be preferable to perform a nonblocking wait so that if no child has yet terminated, we obtain an immediate indication of this fact.
Using wait(), we can find out only about children that have terminated. It is not possible to be notified when a child is stopped by a signal (such as SIGSTOP
or SIGTTIN
) or when a stopped child is resumed by delivery of a SIGCONT
signal.
#include <sys/wait.h>
pid_t waitpid
(pid_t pid, int *status, int options);
Returns process ID of child, 0 (see text), or -1 on error
The return value and status arguments of waitpid() are the same as for wait(). (See The Wait Status Value for an explanation of the value returned in status.) The pid argument enables the selection of the child to be waited for, as follows:
If pid is greater than 0, wait for the child whose process ID equals pid.
If pid equals 0, wait for any child in the same process group as the caller (parent). We describe process groups in Section 34.2.
If pid is less than -1, wait for any child whose process group identifier equals the absolute value of pid.
If pid equals -1, wait for any child. The call wait(&status) is equivalent to the call waitpid(-1, &status, 0).
The options argument is a bit mask that can include (OR) zero or more of the following flags (all of which are specified in SUSv3):
WUNTRACED
In addition to returning information about terminated children, also return information when a child is stopped by a signal.
WCONTINUED
(since Linux 2.6.10)Also return status information about stopped children that have been resumed by delivery of a SIGCONT
signal.
WNOHANG
If no child specified by pid has yet changed state, then return immediately, instead of blocking (i.e., perform a “poll”). In this case, the return value of waitpid() is 0. If the calling process has no children that match the specification in pid, waitpid() fails with the error ECHILD
.
We demonstrate the use of waitpid() in Example 26-3.
In its rationale for waitpid(), SUSv3 notes that the name WUNTRACED
is a historical artifact of this flag’s origin in BSD, where a process could be stopped in one of two ways: as a consequence of being traced by the ptrace() system call, or by being stopped by a signal (i.e., not being traced). When a child is being traced by ptrace(), then delivery of any signal (other than SIGKILL
) causes the child to be stopped, and a SIGCHLD
signal is consequently sent to the parent. This behavior occurs even if the child is ignoring the signal. However, if the child is blocking the signal, then it is not stopped (unless the signal is SIGSTOP
, which can’t be blocked).
The status value returned by wait() and waitpid() allows us to distinguish the following events for the child:
The child terminated by calling _exit() (or exit()), specifying an integer exit status.
The child was terminated by the delivery of an unhandled signal.
The child was stopped by a signal, and waitpid() was called with the WUNTRACED
flag.
The child was resumed by a SIGCONT
signal, and waitpid() was called with the WCONTINUED
flag.
We use the term wait status to encompass all of the above cases. The designation termination status is used to refer to the first two cases. (In the shell, we can obtain the termination status of the last command executed by examining the contents of the variable $?.)
Although defined as an int, only the bottom 2 bytes of the value pointed to by status are actually used. The way in which these 2 bytes are filled depends on which of the above events occurred for the child, as depicted in Figure 26-1.
Figure 26-1 shows the layout of the wait status value for Linux/x86-32. The details vary across implementations. SUSv3 doesn’t specify any particular layout for this information, or even require that it is contained in the bottom 2 bytes of the value pointed to by status. Portable applications should always use the macros described in this section to inspect this value, rather than directly inspecting its bit-mask components.
The <sys/wait.h>
header file defines a standard set of macros that can be used to dissect a wait status value. When applied to a status value returned by wait() or waitpid(), only one of the macros in the list below will return true. Additional macros are provided to further dissect the status value, as noted in the list.
WIFEXITED(status)
This macro returns true if the child process exited normally. In this case, the macro WEXITSTATUS(status)
returns the exit status of the child process. (As noted in Terminating a Process: _exit() and exit(), only the least significant byte of the child’s exit status is available to the parent.)
WIFSIGNALED(status)
This macro returns true if the child process was killed by a signal. In this case, the macro WTERMSIG(status)
returns the number of the signal that caused the process to terminate, and the macro WCOREDUMP(status)
returns true if the child process produced a core dump file. The WCOREDUMP()
macro is not specified by SUSv3, but is available on most UNIX implementations.
WIFSTOPPED(status)
This macro returns true if the child process was stopped by a signal. In this case, the macro WSTOPSIG(status)
returns the number of the signal that stopped the process.
WIFCONTINUED(status)
This macro returns true if the child was resumed by delivery of SIGCONT
. This macro is available since Linux 2.6.10.
Note that although the name status is also used for the argument of the above macros, they expect a plain integer, rather than a pointer to an integer as required by wait() and waitpid().
The printWaitStatus() function of Example 26-2 uses all of the macros described above. This function dissects and prints the contents of a wait status value.
Example 26-2. Displaying the status value returned by wait() and related calls
procexec/print_wait_status.c
#define _GNU_SOURCE /* Get strsignal() declaration from <string.h> */ #include <string.h> #include <sys/wait.h> #include "print_wait_status.h" /* Declaration of printWaitStatus() */ #include "tlpi_hdr.h" /* NOTE: The following function employs printf(), which is not async-signal-safe (see Section 21.1.2). As such, this function is also not async-signal-safe (i.e., beware of calling it from a SIGCHLD handler). */ void /* Examine a wait() status using the W* macros */ printWaitStatus(const char *msg, int status) { if (msg != NULL) printf("%s", msg); if (WIFEXITED(status)) { printf("child exited, status=%d\n", WEXITSTATUS(status)); } else if (WIFSIGNALED(status)) { printf("child killed by signal %d (%s)", WTERMSIG(status), strsignal(WTERMSIG(status))); #ifdef WCOREDUMP /* Not in SUSv3, may be absent on some systems */ if (WCOREDUMP(status)) printf(" (core dumped)"); #endif printf("\n"); } else if (WIFSTOPPED(status)) { printf("child stopped by signal %d (%s)\n", WSTOPSIG(status), strsignal(WSTOPSIG(status))); #ifdef WIFCONTINUED /* SUSv3 has this, but older Linux versions and some other UNIX implementations don't */ } else if (WIFCONTINUED(status)) { printf("child continued\n"); #endif } else { /* Should never happen */ printf("what happened to this child? (status=%x)\n", (unsigned int) status); } }procexec/print_wait_status.c
The printWaitStatus() function is used in Example 26-3. This program creates a child process that either loops continuously calling pause() (during which time signals can be sent to the child) or, if an integer command-line argument was supplied, exits immediately using this integer as the exit status. In the meantime, the parent monitors the child via waitpid(), printing the returned status value and passing this value to printWaitStatus(). The parent exits when it detects that the child has either exited normally or been terminated by a signal.
The following shell session shows a few example runs of the program in Example 26-3. We begin by creating a child that immediately exits with a status of 23:
$ ./child_status 23
Child started with PID = 15807
waitpid() returned: PID=15807; status=0x1700 (23,0)
child exited, status=23
In the next run, we start the program in the background, and then send SIGSTOP
and SIGCONT
signals to the child:
$./child_status &
[1] 15870 $ Child started with PID = 15871kill -STOP 15871
$ waitpid() returned: PID=15871; status=0x137f (19,127) child stopped by signal 19 (Stopped (signal))kill -CONT 15871
$ waitpid() returned: PID=15871; status=0xffff (255,255) child continued
The last two lines of output will appear only on Linux 2.6.10 and later, since earlier kernels don’t support the waitpid() WCONTINUED
option. (This shell session is made slightly hard to read by the fact that output from the program executing in the background is in some cases intermingled with the prompt produced by the shell.)
We continue the shell session by sending a SIGABRT
signal to terminate the child:
kill -ABRT 15871
$ waitpid() returned: PID=15871; status=0x0006 (0,6) child killed by signal 6 (Aborted) Press Enter, in order to see shell notification that background job has terminated [1]+ Done ./child_status $ls -l core
ls: core: No such file or directory $ulimit -c
Display RLIMIT_CORE limit 0
Although the default action of SIGABRT
is to produce a core dump file and terminate the process, no core file was produced. This is because core dumps were disabled—the RLIMIT_CORE
soft resource limit (Details of Specific Resource Limits), which specifies the maximum size of a core file, was set to 0, as shown by the ulimit command above.
We repeat the same experiment, but this time enabling core dumps before sending SIGABRT
to the child:
$ulimit -c unlimited
Allow core dumps $./child_status &
[1] 15902 $ Child started with PID = 15903kill -ABRT 15903
Send SIGABRT to child $ waitpid() returned: PID=15903; status=0x0086 (0,134) child killed by signal 6 (Aborted) (core dumped) Press Enter, in order to see shell notification that background job has terminated [1]+ Done ./child_status $ls -l core
This time we get a core dump -rw------- 1 mtk users 65536 May 6 21:01 core
Example 26-3. Using waitpid() to retrieve the status of a child process
procexec/child_status.c
#include <sys/wait.h> #include "print_wait_status.h" /* Declares printWaitStatus() */ #include "tlpi_hdr.h" int main(int argc, char *argv[]) { int status; pid_t childPid; if (argc > 1 && strcmp(argv[1], "—help") == 0) usageErr("%s [exit-status]\n", argv[0]); switch (fork()) { case -1: errExit("fork"); case 0: /* Child: either exits immediately with given status or loops waiting for signals */ printf("Child started with PID = %ld\n", (long) getpid()); if (argc > 1) /* Status supplied on command line? */ exit(getInt(argv[1], 0, "exit-status")); else /* Otherwise, wait for signals */ for (;;) pause(); exit(EXIT_FAILURE); /* Not reached, but good practice */ default: /* Parent: repeatedly wait on child until it either exits or is terminated by a signal */ for (;;) { childPid = waitpid(-1, &status, WUNTRACED #ifdef WCONTINUED /* Not present on older versions of Linux */ | WCONTINUED #endif ); if (childPid == -1) errExit("waitpid"); /* Print status in hex, and as separate decimal bytes */ printf("waitpid() returned: PID=%ld; status=0x%04x (%d,%d)\n", (long) childPid, (unsigned int) status, status >> 8, status & 0xff); printWaitStatus(NULL, status); if (WIFEXITED(status) || WIFSIGNALED(status)) exit(EXIT_SUCCESS); } } }procexec/child_status.c
As shown in Table 20-1 (in Changing Signal Dispositions: signal()), some signals terminate a process by default. In some circumstances, we may wish to have certain cleanup steps performed before a process terminates. For this purpose, we can arrange to have a handler catch such signals, perform the cleanup steps, and then terminate the process. If we do this, we should bear in mind that the termination status of a process is available to its parent via wait() or waitpid(). For example, calling _exit(EXIT_SUCCESS) from the signal handler will make it appear to the parent process that the child terminated successfully.
If the child needs to inform the parent that it terminated because of a signal, then the child’s signal handler should first disestablish itself, and then raise the same signal once more, which this time will terminate the process. The signal handler would contain code such as the following:
void handler(int sig) { /* Perform cleanup steps */ signal(sig, SIG_DFL); /* Disestablish handler */ raise(sig); /* Raise signal again */ }
Like waitpid(), waitid() returns the status of child processes. However, waitid() provides extra functionality that is unavailable with waitpid(). This system call derives from System V, but is now specified in SUSv3. It was added to Linux in kernel 2.6.9.
Before Linux 2.6.9, a version of waitid() was provided via an implementation in glibc. However, because a full implementation of this interface requires kernel support, the glibc implementation provided no more functionality than was available using waitpid().
#include <sys/wait.h>
int waitid
(idtype_t idtype, id_t id, siginfo_t *infop, int options);
Returns 0 on success or if WNOHANG
was specified and there were no children to wait for, or -1 on error
The idtype and id arguments specify which child(ren) to wait for, as follows:
If idtype is P_ALL
, wait for any child; id is ignored.
If idtype is P_PID
, wait for the child whose process ID equals id.
If idtype is P_PGID
, wait for any child whose process group ID equals id.
Note that unlike waitpid(), it is not possible to specify 0 in id to mean any process in the same process group as the caller. Instead, we must explicitly specify the caller’s process group ID using the value returned by getpgrp().
The most significant difference between waitpid() and waitid() is that waitid() provides more precise control of the child events that should be waited for. We control this by ORing one or more of the following flags in options:
The following additional flags may be ORed in options:
WNOHANG
This flag has the same meaning as for waitpid(). If none of the children matching the specification in id has status information to return, then return immediately (a poll). In this case, the return value of waitid() is 0. If the calling process has no children that match the specification in id, waitid() instead fails with the error ECHILD
.
WNOWAIT
Normally, once a child has been waited for using waitid(), then that “status event” is consumed. However, if WNOWAIT
is specified, then the child status is returned, but the child remains in a waitable state, and we can later wait for it again to retrieve the same information.
On success, waitid() returns 0, and the siginfo_t structure (The SA_SIGINFO Flag) pointed to by infop is updated to contain information about the child. The following fields are filled in the siginfo_t structure:
This field contains one of the following values: CLD_EXITED
, indicating that the child terminated by calling _exit(); CLD_KILLED
, indicating that the child was killed by a signal; CLD_STOPPED
, indicating that the child was stopped by a signal; or CLD_CONTINUED
, indicating that the (previously stopped) child resumed execution as a consequence of receiving a (SIGCONT
) signal.
This field contains the process ID of the child whose state has changed.
This field is always set to SIGCHLD
.
This field contains either the exit status of the child, as passed to _exit(), or the signal that caused the child to stop, continue, or terminate. We can determine which type of information is in this field by examining the si_code field.
This field contains the real user ID of the child. Most other UNIX implementations don’t set this field.
On Solaris, two additional fields are filled in: si_stime and si_utime. These contain the system and user CPU time used by the child, respectively. SUSv3 doesn’t require these fields to be set by waitid().
One detail of the operation of waitid() needs further clarification. If WNOHANG
is specified in options, then a 0 return value from waitid() can mean one of two things: a child had already changed state at the time of the call (and information about the child is returned in the siginfo_t structure pointed to by infop), or there was no child whose state has changed. For the case where no child has changed state, some UNIX implementations (including Linux), zero out the returned siginfo_t structure. This provides a method of distinguishing the two possibilities: we can check whether the value in si_pid is 0 or nonzero. Unfortunately, this behavior is not required by SUSv3, and some UNIX implementations leave the siginfo_t structure unchanged in this case. (A future corrigendum to SUSv4 is likely to add a requirement that si_pid and si_signo are zeroed in this case.) The only portable way to distinguish these two cases is to zero out the siginfo_t structure before calling waitid(), as in the following code:
siginfo_t info; ... memset(&info, 0, sizeof(siginfo_t)); if (waitid(idtype, id, &info, options | WNOHANG) == -1) errExit("waitid"); if (info.si_pid == 0) { /* No children changed state */ } else { /* A child changed state; details are provided in 'info' */ }
The wait3() and wait4() system calls perform a similar task to waitpid(). The principal semantic difference is that wait3() and wait4() return resource usage information about the terminated child in the structure pointed to by the rusage argument. This information includes the amount of CPU time used by the process and memory-management statistics. We defer detailed discussion of the rusage structure until Process Resource Usage, where we describe the getrusage() system call.
#define _BSD_SOURCE /* Or #define _XOPEN_SOURCE 500 for wait3() */ #include <sys/resource.h> #include <sys/wait.h> pid_twait3
(int *status, int options, struct rusage *rusage); pid_twait4
(pid_t pid, int *status, int options, struct rusage *rusage);
Both return process ID of child, or -1 on error
Excluding the use of the rusage argument, a call to wait3() is equivalent to the following waitpid() call:
waitpid(-1, &status, options);
Similarly, wait4() is equivalent to the following:
waitpid(pid, &status, options);
In other words, wait3() waits for any child, while wait4() can be used to select a specific child or children upon which to wait.
On some UNIX implementations, wait3() and wait4() return resource usage information only for terminated children. On Linux, resource usage information can also be retrieved for stopped children if the WUNTRACED
flag is specified in options.
The names for these two system calls refer to the number of arguments they each take. Both system calls originated in BSD, but are now available on most UNIX implementations. Neither is standardized in SUSv3. (SUSv2 did specify wait3(), but marked it LEGACY.)
We usually avoid the use of wait3() and wait4() in this book. Typically, we don’t need the extra information returned by these calls. Also, lack of standardization limits their portability.