Realtime Process Scheduling API

Modifying and Retrieving Policies and Priorities

In this section, we look at the system calls that modify and retrieve scheduling policies and priorities.

Modifying scheduling policies and priorities

The sched_setscheduler() system call changes both the scheduling policy and the priority of the process whose process ID is specified in pid. If pid is specified as 0, the attributes of the calling process are changed.

#include <sched.h>

int sched_setscheduler(pid_t pid, int policy,
 const struct sched_param *param);

Note

Returns 0 on success, or -1 on error

The param argument is a pointer to a structure of the following form:

struct sched_param {
     int sched_priority;        /* Scheduling priority */
};

SUSv3 defines the param argument as a structure to allow an implementation to include additional implementation-specific fields, which may be useful if an implementation provides additional scheduling policies. However, like most UNIX implementations, Linux provides just the sched_priority field, which specifies the scheduling priority. For the SCHED_RR and SCHED_FIFO policies, this must be a value in the range indicated by sched_get_priority_min() and sched_get_priority_max(); for other policies, the priority must be 0.

The policy argument determines the scheduling policy for the process. It is specified as one of the policies shown in Table 35-1.

Table 35-1. Linux realtime and nonrealtime scheduling policies

Policy	Description	SUSv3
`SCHED_FIFO`	Realtime first-in first-out	•
`SCHED_RR`	Realtime round-robin	•
`SCHED_OTHER`	Standard round-robin time-sharing	•
`SCHED_BATCH`	Similar to `SCHED_OTHER`, but intended for batch execution (since Linux 2.6.16)
`SCHED_IDLE`	Similar to `SCHED_OTHER`, but with priority even lower than nice value +19 (since Linux 2.6.23)

A successful sched_setscheduler() call moves the process specified by pid to the back of the queue for its priority level.

SUSv3 specifies that the return value of a successful sched_setscheduler() call should be the previous scheduling policy. However, Linux deviates from the standard in that a successful call returns 0. A portable application should test for success by checking that the return status is not -1.

The scheduling policy and priority are inherited by a child created via fork(), and they are preserved across an exec().

The sched_setparam() system call provides a subset of the functionality of sched_setscheduler(). It modifies the scheduling priority of a process while leaving the policy unchanged.

#include <sched.h>

int sched_setparam(pid_t pid, const struct sched_param *param);

Note

Returns 0 on success, or -1 on error

The pid and param arguments are the same as for sched_setscheduler().

A successful sched_setparam() call moves the process specified by pid to the back of the queue for its priority level.

The program in Example 35-2 uses sched_setscheduler() to set the policy and priority of the processes specified by its command-line arguments. The first argument is a letter specifying a scheduling policy, the second is an integer priority, and the remaining arguments are the process IDs of the processes whose scheduling attributes are to be changed.

Example 35-2. Modifying process scheduling policies and priorities

procpri/sched_set.c
#include <sched.h>
#include "tlpi_hdr.h"

int
main(int argc, char *argv[])
{
    int j, pol;
    struct sched_param sp;

    if (argc < 3 || strchr("rfo", argv[1][0]) == NULL)
        usageErr("%s policy priority [pid...]\n"
                "    policy is 'r' (RR), 'f' (FIFO), "
#ifdef SCHED_BATCH              /* Linux-specific */
                "'b' (BATCH), "
#endif
#ifdef SCHED_IDLE               /* Linux-specific */
                "'i' (IDLE), "
#endif
                "or 'o' (OTHER)\n",
                argv[0]);

    pol = (argv[1][0] == 'r') ? SCHED_RR :
                (argv[1][0] == 'f') ? SCHED_FIFO :
#ifdef SCHED_BATCH
                (argv[1][0] == 'b') ? SCHED_BATCH :
#endif
#ifdef SCHED_IDLE
                (argv[1][0] == 'i') ? SCHED_IDLE :
#endif
                SCHED_OTHER;
    sp.sched_priority = getInt(argv[2], 0, "priority");

    for (j = 3; j < argc; j++)
        if (sched_setscheduler(getLong(argv[j], 0, "pid"), pol, &sp) == -1)
            errExit("sched_setscheduler");

    exit(EXIT_SUCCESS);
}
     procpri/sched_set.c

#include <sched.h>

int sched_getscheduler(pid_t pid);

Note

Returns scheduling policy, or -1 on error

int sched_getparam(pid_t pid, struct sched_param *param);

Note

Returns 0 on success, or -1 on error

For both of these system calls, pid specifies the ID of the process about which information is to be retrieved. If pid is 0, information is retrieved about the calling process. Both system calls can be used by an unprivileged process to retrieve information about any process, regardless of credentials.

The sched_getparam() system call returns the realtime priority of the specified process in the sched_priority field of the sched_param structure pointed to by param.

Upon successful execution, sched_getscheduler() returns one of the policies shown earlier in Table 35-1.

The program in Example 35-3 uses sched_getscheduler() and sched_getparam() to retrieve the policy and priority of all of the processes whose process IDs are given as command-line arguments. The following shell session demonstrates the use of this program, as well as the program in Example 35-2:

$ su                          Assume privilege so we can set realtime policies
Password:
# sleep 100 &                 Create a process
[1] 2006
# ./sched_view 2006           View initial policy and priority of
 sleep process
2006: OTHER  0
# ./sched_set f 25 2006       Switch process to
SCHED_FIFO policy, priority 25
# ./sched_view 2006           Verify change
2006: FIFO  25

Example 35-3. Retrieving process scheduling policies and priorities

procpri/sched_view.c
#include <sched.h>
#include "tlpi_hdr.h"

int
main(int argc, char *argv[])
{
    int j, pol;
    struct sched_param sp;

    for (j = 1; j < argc; j++) {
        pol = sched_getscheduler(getLong(argv[j], 0, "pid"));
        if (pol == -1)
            errExit("sched_getscheduler");

        if (sched_getparam(getLong(argv[j], 0, "pid"), &sp) == -1)
            errExit("sched_getparam");

        printf("%s: %-5s %2d\n", argv[j],
                (pol == SCHED_OTHER) ? "OTHER" :
                (pol == SCHED_RR) ? "RR" :
                (pol == SCHED_FIFO) ? "FIFO" :
#ifdef SCHED_BATCH              /* Linux-specific */
                (pol == SCHED_BATCH) ? "BATCH" :
#endif
#ifdef SCHED_IDLE               /* Linux-specific */
                (pol == SCHED_IDLE) ? "IDLE" :
#endif
                "???", sp.sched_priority);
    }

    exit(EXIT_SUCCESS);
}
     procpri/sched_view.c

Preventing realtime processes from locking up the system

Since SCHED_RR and SCHED_FIFO processes preempt any lower-priority processes (e.g., the shell under which the program is run), when developing applications that use these policies, we need to be aware of the possibility that a runaway realtime process could lock up the system by hogging the CPU. Programmatically, there are a few of ways to avoid this possibility:

Establish a suitably low soft CPU time resource limit (RLIMIT_CPU, described in Details of Specific Resource Limits) using setrlimit(). If the process consumes too much CPU time, it will be sent a SIGXCPU signal, which kills the process by default.
Set an alarm timer using alarm(). If the process continues running for a wall clock time that exceeds the number of seconds specified in the alarm() call, then it will be killed by a SIGALRM signal.
Create a watchdog process that runs with a high realtime priority. This process can loop repeatedly, sleeping for a specified interval, and then waking and monitoring the status of other processes. Such monitoring could include measuring the value of the CPU time clock for each process (see the discussion of the clock_getcpuclockid() function in Obtaining the Clock ID of a Specific Process or Thread) and checking its scheduling policy and priority using sched_getscheduler() and sched_getparam(). If a process is deemed to be misbehaving, the watchdog thread could lower the process’s priority, or stop or terminate it by sending an appropriate signal.
Since kernel 2.6.25, Linux provides a nonstandard resource limit, RLIMIT_RTTIME, for controlling the amount of CPU time that can be consumed in a single burst by a process running under a realtime scheduling policy. Specified in microseconds, RLIMIT_RTTIME limits the amount of CPU time that the process may consume without performing a system call that blocks. When the process does perform such a call, the count of consumed CPU time is reset to 0. The count of consumed CPU time is not reset if the process is preempted by a higher-priority process, is scheduled off the CPU because its time slice expired (for a SCHED_RR process), or calls sched_yield() (Relinquishing the CPU). If the process reaches its limit of CPU time, then, as with RLIMIT_CPU, it will be sent a SIGXCPU signal, which kills the process by default.

Note

The changes in kernel 2.6.25 can also help prevent runaway realtime processes from locking up the system. For details, see the kernel source file Documentation/scheduler/sched-rt-group.txt.

Preventing child processes from inheriting privileged scheduling policies

Linux 2.6.32 added SCHED_RESET_ON_FORK as a value that can be specified in policy when calling sched_setscheduler(). This is a flag value that is ORed with one of the policies in Table 35-1. If this flag is set, then children that are created by this process using fork() do not inherit privileged scheduling policies and priorities. The rules are as follows:

If the calling process has a realtime scheduling policy (SCHED_RR or SCHED_FIFO), then the policy in child processes is reset to the standard round-robin time-sharing policy, SCHED_OTHER.
If the process has a negative (i.e., high) nice value, then the nice value in child processes is reset to 0.

The SCHED_RESET_ON_FORK flag was designed to be used in media-playback applications. It permits the creation of single processes that have realtime scheduling policies that can’t be passed to child processes. Using the SCHED_RESET_ON_FORK flag prevents the creation of fork bombs that try to evade the ceiling set by the RLIMIT_RTTIME resource limit by creating multiple children running under realtime scheduling policies.

Once the SCHED_RESET_ON_FORK flag has been enabled for a process, only a privileged process (CAP_SYS_NICE) can disable it. When a child process is created, its reset-on-fork flag is disabled.

Realtime Process Scheduling API

Note

Realtime Priority Ranges

Note

Note

Modifying and Retrieving Policies and Priorities

Modifying scheduling policies and priorities

Note

Note

Privileges and resource limits affecting changes to scheduling parameters

Note

Retrieving scheduling policies and priorities

Note

Note

Preventing realtime processes from locking up the system

Note

Preventing child processes from inheriting privileged scheduling policies

Relinquishing the CPU

Note

The `SCHED_RR` Time Slice

Note

Note