This chapter extends the material presented in Chapter 24 to Chapter 27 by covering a variety of topics related to process creation and program execution. We describe process accounting, a kernel feature that writes an accounting record for each process on the system as it terminates. We then look at the Linux-specific clone() system call, which is the low-level API that is used to create threads on Linux. We follow this with some comparisons of the performance of fork(), vfork(), and clone(). We conclude with a summary of the effects of fork() and exec() on the attributes of a process.
When process accounting is enabled, the kernel writes an accounting record to the system-wide process accounting file as each process terminates. This accounting record contains various information maintained by the kernel about the process, including its termination status and how much CPU time it consumed. The accounting file can be analyzed by standard tools (sa(8) summarizes information from the accounting file, and lastcomm(1) lists information about previously executed commands) or by tailored applications.
In kernels before 2.6.10, a separate process accounting record was written for each thread created using the NPTL threading implementation. Since kernel 2.6.10, a single accounting record is written for the entire process when the last thread terminates. Under the older LinuxThreads threading implementation, a single process accounting record is always written for each thread.
The acct() system call is used by a privileged (CAP_SYS_PACCT
) process to enable and disable process accounting. This system call is rarely used in application programs. Normally, process accounting is enabled at each system restart by placing appropriate commands in the system boot scripts.
#define _BSD_SOURCE
#include <unistd.h>
int acct
(const char *acctfile);
Returns 0 on success, or -1 on error
To enable process accounting, we supply the pathname of an existing regular file in acctfile. A typical pathname for the accounting file is /var/log/pacct
or /usr/account/pacct
. To disable process accounting, we specify acctfile as NULL
.
The program in Example 28-1 uses acct() to switch process accounting on and off. The functionality of this program is similar to the shell accton(8) command.
Once process accounting is enabled, an acct record is written to the accounting file as each process terminates. The acct structure is defined in <sys/acct.h>
as follows:
typedef u_int16_t comp_t; /* See text */ struct acct { char ac_flag; /* Accounting flags (see text) */ u_int16_t ac_uid; /* User ID of process */ u_int16_t ac_gid; /* Group ID of process */ u_int16_t ac_tty; /* Controlling terminal for process (may be 0 if none, e.g., for a daemon) */ u_int32_t ac_btime; /* Start time (time_t; seconds since the Epoch) */ comp_t ac_utime; /* User CPU time (clock ticks) */ comp_t ac_stime; /* System CPU time (clock ticks) */ comp_t ac_etime; /* Elapsed (real) time (clock ticks) */ comp_t ac_mem; /* Average memory usage (kilobytes) */ comp_t ac_io; /* Bytes transferred by read(2) and write(2) (unused) */ comp_t ac_rw; /* Blocks read/written (unused) */ comp_t ac_minflt; /* Minor page faults (Linux-specific) */ comp_t ac_majflt; /* Major page faults (Linux-specific) */ comp_t ac_swaps; /* Number of swaps (unused; Linux-specific) */ u_int32_t ac_exitcode; /* Process termination status */ #define ACCT_COMM 16 char ac_comm[ACCT_COMM+1]; /* (Null-terminated) command name (basename of last execed file) */ char ac_pad[10]; /* Padding (reserved for future use) */ };
Note the following points regarding the acct structure:
The u_int16_t and u_int32_t data types are 16-bit and 32-bit unsigned integers.
The ac_flag field is a bit mask recording various events for the process. The bits that can appear in this field are shown in Table 28-1. As indicated in the table, some of these bits are not present on all UNIX implementations. A few other implementations provide additional bits in this field.
The ac_comm field records the name of the last command (program file) executed by this process. The kernel records this value on each execve(). On some other UNIX implementations, this field is limited to 8 characters.
The comp_t type is a kind of floating-point number. Values of this type are sometimes called compressed clock ticks. The floating-point value consists of a 3-bit, base-8 exponent, followed by a 13-bit mantissa; the exponent can represent a factor in the range 80=1 to 87 (2,097,152). For example, a mantissa of 125 and an exponent of 1 represent the value 1000. Example 28-2 defines a function (comptToLL()) to convert this type to long long. We need to use the type long long because the 32 bits used to represent an unsigned long on x86-32 are insufficient to hold the largest value that can be represented in comp_t, which is (213 - 1) * 87.
The three time fields defined with the type comp_t represent time in system clock ticks. Therefore, we must divide these times by the value returned by sysconf(_SC_CLK_TCK) in order to convert them to seconds.
The ac_exitcode field holds the termination status of the process (described in The Wait Status Value). Most other UNIX implementations instead provide a single-byte field named ac_stat, which records only the signal that killed the process (if it was killed by a signal) and a bit indicating whether that signal caused the process to dump core. BSD-derived implementations don’t provide either field.
The program in Example 28-2 displays selected fields from the records in a process accounting file. The following shell session demonstrates the use of this program. We begin by creating a new, empty process accounting file and enabling process accounting:
$su
Need privilege to enable process accounting Password: #touch pacct
#./acct_on pacct
This process will be first entry in accounting file Process accounting enabled #exit
Cease being superuser
At this point, three processes have already terminated since we enabled process accounting. These processes executed the acct_on, su, and bash programs. The bash process was started by su to run the privileged shell session.
Now we run a series of commands to add further records to the accounting file:
$sleep 15 &
[1] 18063 $ulimit -c unlimited
Allow core dumps (shell built-in) $cat
Create a process Type Control-\ (generates SIGQUIT , signal 3) to kill cat process Quit (core dumped) $ Press Enter to see shell notification of completion of sleep before next shell prompt [1]+ Done sleep 15 $grep xxx badfile
grep fails with status of 2 grep: badfile: No such file or directory $echo $?
The shell obtained status of grep (shell built-in) 2
The next two commands run programs that we presented in previous chapters (Example 27-1, in The exec() Library Functions, and Example 24-1, in File Sharing Between Parent and Child). The first command runs a program that execs the file /bin/echo
; this results in an accounting record with the command name echo. The second command creates a child process that doesn’t perform an exec().
$./t_execve /bin/echo
hello world goodbye $./t_fork
PID=18350 (child) idata=333 istack=666 PID=18349 (parent) idata=111 istack=222
Finally, we use the program in Example 28-2 to view the contents of the accounting file:
$ ./acct_view pacct
command flags term. user start time CPU elapsed
status time time
acct_on -S-- 0 root 2010-07-23 17:19:05 0.00 0.00
bash ---- 0 root 2010-07-23 17:18:55 0.02 21.10
su -S-- 0 root 2010-07-23 17:18:51 0.01 24.94
cat --XC 0x83 mtk 2010-07-23 17:19:55 0.00 1.72
sleep ---- 0 mtk 2010-07-23 17:19:42 0.00 15.01
grep ---- 0x200 mtk 2010-07-23 17:20:12 0.00 0.00
echo ---- 0 mtk 2010-07-23 17:21:15 0.01 0.01
t_fork F--- 0 mtk 2010-07-23 17:21:36 0.00 0.00
t_fork ---- 0 mtk 2010-07-23 17:21:36 0.00 3.01
In the output, we see one line for each process that was created in the shell session. The ulimit and echo commands are shell built-in commands, so they don’t result in the creation of new processes. Note that the entry for sleep appeared in the accounting file after the cat entry because the sleep command terminated after the cat command.
Most of the output is self-explanatory. The flags column shows single letters indicating which of the ac_flag bits is set in each record (see Table 28-1). The Wait Status Value describes how to interpret the termination status values shown in the term. status column.
Starting with kernel 2.6.8, Linux introduced an optional alternative version of the process accounting file that addresses some limitations of the traditional accounting file. To use this alternative version, known as Version 3, the CONFIG_BSD_PROCESS_ACCT_V3
kernel configuration option must be enabled before building the kernel.
struct acct_v3 { char ac_flag; /* Accounting flags */ char ac_version; /* Accounting version (3) */ u_int16_t ac_tty; /* Controlling terminal for process */ u_int32_t ac_exitcode; /* Process termination status */ u_int32_t ac_uid; /* 32-bit user ID of process */ u_int32_t ac_gid; /* 32-bit group ID of process */ u_int32_t ac_pid; /* Process ID */ u_int32_t ac_ppid; /* Parent process ID */ u_int32_t ac_btime; /* Start time (time_t) */ float ac_etime; /* Elapsed (real) time (clock ticks) */ comp_t ac_utime; /* User CPU time (clock ticks) */ comp_t ac_stime; /* System CPU time (clock ticks) */ comp_t ac_mem; /* Average memory usage (kilobytes) */ comp_t ac_io; /* Bytes read/written (unused) */ comp_t ac_rw; /* Blocks read/written (unused) */ comp_t ac_minflt; /* Minor page faults */ comp_t ac_majflt; /* Major page faults */ comp_t ac_swaps; /* Number of swaps (unused; Linux-specific) */ #define ACCT_COMM 16 char ac_comm[ACCT_COMM]; /* Command name */ };
The following are the main differences between the acct_v3 structure and the traditional Linux acct structure:
The ac_version field is added. This field contains the version number of this type of accounting record. This field is always 3 for an acct_v3 record.
The fields ac_pid and ac_ppid, containing the process ID and parent process ID of the terminated process, are added.
The ac_uid and ac_gid fields are widened from 16 to 32 bits, to accommodate the 32-bit user and group IDs that were introduced in Linux 2.4. (Large user and group IDs can’t be correctly represented in the traditional acct file.)
The type of the ac_etime field is changed from comp_t to float, to allow longer elapsed times to be recorded.
We provide a Version 3 analog of the program in Example 28-2 in the file procexec/acct_v3_view.c
in the source code distribution for this book.