CPU Affinity

When a process is rescheduled to run on a multiprocessor system, it doesn’t necessarily run on the same CPU on which it last executed. The usual reason it may run on another CPU is that the original CPU is already busy.

When a process changes CPUs, there is a performance impact: in order for a line of the process’s data to be loaded into the cache of the new CPU, it must first be invalidated (i.e., either discarded if it is unmodified, or flushed to main memory if it was modified), if present in the cache of the old CPU. (To prevent cache inconsistencies, multiprocessor architectures allow data to be kept in only one CPU cache at a time.) This invalidation costs execution time. Because of this performance impact, the Linux (2.6) kernel tries to ensure soft CPU affinity for a process—wherever possible, the process is rescheduled to run on the same CPU.

Note

A cache line is the cache analog of a page in a virtual memory management system. It is the size of the unit used for transfers between the CPU cache and main memory. Typical line sizes range from 32 to 128 bytes. For further information, see [Schimmel, 1994] and [Drepper, 2007].

One of the fields in the Linux-specific /proc/PID/stat file displays the number of the CPU on which a process is currently executing or last executed. See the proc(5) manual page for details.

Sometimes, it is desirable to set hard CPU affinity for a process, so that it is explicitly restricted to always running on one, or a subset, of the available CPUs. Among the reasons we may want to do this are the following:

Note

The isolcpus kernel boot option can be used to isolate one or more CPUs from the normal kernel scheduling algorithms. The only way to move a process on or off a CPU that has been isolated is via the CPU affinity system calls described in this section. The isolcpus boot option is the preferred method of implementing the last of the scenarios listed above. For details, see the kernel source file Documentation/kernel-parameters.txt.

Linux also provides a cpuset kernel option, which can be used on systems containing large numbers of CPUs to achieve more sophisticated control over how the CPUs and memory are allocated to processes. For details, see the kernel source file Documentation/cpusets.txt.

Linux 2.6 provides a pair of nonstandard system calls to modify and retrieve the hard CPU affinity of a process: sched_setaffinity() and sched_getaffinity().

Note

Many other UNIX implementations provide interfaces for controlling CPU affinity. For example, HP-UX and Solaris provide a pset_bind() system call.

The sched_setaffinity() system call sets the CPU affinity of the process specified by pid. If pid is 0, the CPU affinity of the calling process is changed.

#define _GNU_SOURCE
#include <sched.h>

int sched_setaffinity(pid_t pid, size_t len, cpu_set_t *set);

Note

Returns 0 on success, or -1 on error

The CPU affinity to be assigned to the process is specified in the cpu_set_t structure pointed to by set.

Note

CPU affinity is actually a per-thread attribute that can be adjusted independently for each of the threads in a thread group. If we want to change the CPU affinity of a specific thread in a multithreaded process, we can specify pid as the value returned by a call to gettid() in that thread. Specifying pid as 0 means the calling thread.

Although the cpu_set_t data type is implemented as a bit mask, we should treat it as an opaque structure. All manipulations of the structure should be done using the macros CPU_ZERO(), CPU_SET(), CPU_CLR(), and CPU_ISSET().

#define _GNU_SOURCE
#include <sched.h>

void CPU_ZERO(cpu_set_t *set);
void CPU_SET(int cpu, cpu_set_t *set);
void CPU_CLR(int cpu, cpu_set_t *set);

int CPU_ISSET(int cpu, cpu_set_t *set);

Note

Returns true (1) if cpu is in set, or false (0) otherwise

These macros operate on the CPU set pointed to by set as follows:

  • CPU_ZERO() initializes set to be empty.

  • CPU_SET() adds the CPU cpu to set.

  • CPU_CLR() removes the CPU cpu from set.

  • CPU_ISSET() returns true if the CPU cpu is a member of set.

Note

The GNU C library also provides a number of other macros for working with CPU sets. See the CPU_SET(3) manual page for details.

The CPUs in a CPU set are numbered starting at 0. The <sched.h> header file defines the constant CPU_SETSIZE to be one greater than the maximum CPU number that can be represented in a cpu_set_t variable. CPU_SETSIZE has the value 1024.

The len argument given to sched_setaffinity() should specify the number of bytes in the set argument (i.e., sizeof(cpu_set_t)).

The following code confines the process identified by pid to running on any CPU other than the first CPU of a four-processor system:

cpu_set_t set;

CPU_ZERO(&set);
CPU_SET(1, &set);
CPU_SET(2, &set);
CPU_SET(3, &set);

sched_setaffinity(pid, CPU_SETSIZE, &set);

If the CPUs specified in set don’t correspond to any CPUs on the system, then sched_setaffinity() fails with the error EINVAL.

If set doesn’t include the CPU on which the calling process is currently running, then the process is migrated to one of the CPUs in set.

An unprivileged process may set the CPU affinity of another process only if its effective user ID matches the real or effective user ID of the target process. A privileged (CAP_SYS_NICE) process may set the CPU affinity of any process.

The sched_getaffinity() system call retrieves the CPU affinity mask of the process specified by pid. If pid is 0, the CPU affinity mask of the calling process is returned.

#define _GNU_SOURCE
#include <sched.h>

int sched_getaffinity(pid_t pid, size_t len, cpu_set_t *set);

Note

Returns 0 on success, or -1 on error

The CPU affinity mask is returned in the cpu_set_t structure pointed to by set. The len argument should be set to indicate the number of bytes in this structure (i.e., sizeof(cpu_set_t)). We can use the CPU_ISSET() macro to determine which CPUs are in the returned set.

If the CPU affinity mask of the target process has not otherwise been modified, sched_getaffinity() returns a set containing all of the CPUs on the system.

No permission checking is performed by sched_getaffinity(); an unprivileged process can retrieve the CPU affinity mask of any process on the system.

A child process created by fork() inherits its parent’s CPU affinity mask, and this mask is preserved across an exec().

The sched_setaffinity() and sched_getaffinity() system calls are Linux-specific.

Note

The t_sched_setaffinity.c and t_sched_getaffinity.c programs in the procpri subdirectory in the source code distribution for this book demonstrate the use of sched_setaffinity() and sched_getaffinity().