Processes

A process holds the environment in which threads can run: it holds the memory mappings, the file descriptors, the user and group IDs, and more. The first process is the init process, which is created by the kernel during boot and has a PID of one. Thereafter, processes are created by duplication in an operation known as forking.

The POSIX function to create a process is fork(2). It is an odd function because, for each successful call, there are two returns: one in the process that made the call, known as the parent, and one in the newly created process, known as the child as shown in the following diagram:

Creating a new process

Immediately after the call, the child is an exact copy of the parent, it has the same stack, the same heap, the same file descriptors, and executes the same line of code, the one following fork(2). The only way the programmer can tell them apart is by looking at the return value of fork: it is zero for the child and greater than zero for the parent. Actually, the value returned in the parent is the PID of the newly created child process. There is a third possibility, which is that the return is negative, meaning that the fork call failed and there is still only one process.

Although the two processes are initially identical, they are in separate address spaces. Changes made to a variable by one will not be seen by the other. Under the hood, the kernel does not make a physical copy of the parent's memory, which would be quite a slow operation and consume memory unnecessarily. Instead, the memory is shared but marked with a copy-on-write (CoW) flag. If either parent or child modifies this memory, the kernel first makes a copy and then writes to the copy. This has the benefit of an efficient fork function while retaining the logical separation of process address spaces. I will discuss CoW in Chapter 11, Managing Memory.

A process may be stopped voluntarily by calling the exit(3) function or, involuntarily, by receiving a signal that is not handled. One signal in particular, SIGKILL, cannot be handled and so will always kill a process. In all cases, terminating the process will stop all threads, close all file descriptors, and release all memory. The system sends a signal, SIGCHLD, to the parent so that it knows this has happened.

Processes have a return value which is composed of either the argument to exit(3), if it terminated normally, or the signal number if it was killed. The chief use for this is in shell scripts: it allows you to test the return from a program. By convention, 0 indicates success and other values indicate a failure of some sort.

The parent can collect the return value with the wait(2) or waitpid(2) functions. This causes a problem: there will be a delay between a child terminating and its parent collecting the return value. In that period, the return value must be stored somewhere, and the PID number of the now dead process cannot be reused. A process in this state is a zombie, state Z in ps or top. So long as the parent calls wait(2) or waitpid(2), whenever it is notified of a child's termination (by means of the SIGCHLD signal, see Linux System Programming, by Robert Love, O'Reilly Media or The Linux Programming Interface, by Michael Kerrisk, No Starch Press for details of handling signals), zombies exist for too short a time to show up in process listings. They will become a problem if the parent fails to collect the return value because you will not be able to create any more processes.

Here is a simple example, showing process creation and termination:

The wait(2) function blocks until a child process exits and stores the exit status. When you run it, you see something like this:

The child process inherits most of the attributes of the parent, including the user and group IDs (UID and GID), all open file descriptors, signal handling, and scheduling characteristics.

The fork function creates a copy of a running program, but it does not run a different program. For that, you need one of the exec functions:

Each takes a path to the program file to load and run. If the function succeeds, the kernel discards all the resources of the current process, including memory and file descriptors, and allocates memory to the new program being loaded. When the thread that called exec* returns, it returns not to the line of code after the call, but to the main() function of the new program. Here is an example of a command launcher: it prompts for a command, for example, /bin/ls, and forks and executes the string you enter:

It might seem odd to have one function that duplicates an existing process and another that discards its resources and loads a different program into memory, especially since it is common for a fork to be followed almost immediately by exec. Most operating systems combine the two actions into a single call.

There are distinct advantages, however. For example, it makes it very easy to implement redirection and pipes in the shell. Imagine that you want to get a directory listing, this is the sequence of events:

Now, imagine that you want the directory listing to be written to a file by redirecting the output using the > character. The sequence is now as follows:

Note that there is an opportunity at step three to modify the environment of the child process before executing the program. The ls program does not need to know that it is writing to a file rather than a terminal. Instead of a file, stdout could be connected to a pipe and so the ls program, still unchanged, can send output to another program. This is part of the Unix philosophy of combining many small components that each do a job well, as described in The Art of Unix Programming, by Eric Steven Raymond, Addison Wesley; (23 Sept. 2003) ISBN 978-0131429017, especially in the section Pipes, Redirection, and Filters.

We have encountered daemons in several places already. A daemon is a process that runs in the background, owned by the init process, PID1, and not connected to a controlling terminal. The steps to create a daemon are as follows:

Thankfully, all of the preceding steps can be achieved with a single function call, daemon(3).

Each process is an island of memory. You can pass information from one to another in two ways. Firstly, you can copy it from one address space to the other. Secondly, you can create an area of memory that both can access and so share the data.

The first is usually combined with a queue or buffer so that there is a sequence of messages passing between processes. This implies copying the message twice: first to a holding area and then to the destination. Some examples of this are sockets, pipes, and POSIX message queues.

The second way requires not only a method of creating memory that is mapped into two (or more) address spaces at once, but also a means of synchronizing access to that memory, for example, by using semaphores or mutexes. POSIX has functions for all of these.

There is an older set of APIs known as System V IPC, which provides message queues, shared memory, and semaphores, but it is not as flexible as the POSIX equivalents so I will not describe it here. The man page on svipc(7) gives an overview of the facilities and there is more detail in The Linux Programming Interface, by Michael Kerrisk, No Starch Press and Unix Network Programming, Volume 2, by W. Richard Stevens.

Message-based protocols are usually easier to program and debug than shared memory, but are slow if the messages are large.

There are several options which I will summarize as follows. The attributes that differentiate between them are:

The following table summarizes these properties for FIFOs, sockets, and message queues:

Property

FIFO

Unix socket: stream

Unix socket: datagram

POSIX message queue

Message boundary

Byte stream

Byte stream

Discrete

Discrete

Uni/bi-directional

Uni

Bi

Uni

Uni

Max message size

Unlimited

Unlimited

In the range 100 KiB to 250 KiB

Default: 8 KiB, absolute maximum: 1 MiB

Priority levels

None

None

None

0 to 32767

Sharing memory removes the need for copying data between address spaces but introduces the problem of synchronizing accesses to it. Synchronization between processes is commonly achieved using semaphores.

To share memory between processes, you first have to create a new area of memory and then map it into the address space of each process that wants access to it, as in the following diagram:

POSIX shared memory

POSIX shared memory follows the pattern we encountered with message queues. The segments are identified by names that begin with a / character and have exactly one such character. The function shm_open(3) takes the name and returns a file descriptor for it. If it does not exist already and the O_CREAT flag is set, then a new segment is created. Initially it has a size of zero. Use the (misleadingly named) ftruncate(2) to expand it to the desired size.

Once you have a descriptor for the shared memory, you map it into the address space of the process using mmap(2), and so threads in different processes can access the memory.

Here is an example:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/stat.h>  /* For mode constants */
#include <fcntl.h>
#include <sys/types.h>
#include <errno.h>
#include <semaphore.h>
#define SHM_SEGMENT_SIZE 65536
#define SHM_SEGMENT_NAME "/demo-shm"
#define SEMA_NAME "/demo-sem"

static sem_t *demo_sem;
/*
 * If the shared memory segment does not exist already, create it
 * Returns a pointer to the segment or NULL if there is an error
 */

static void *get_shared_memory(void)
{
  int shm_fd;
  struct shared_data *shm_p;
  /* Attempt to create the shared memory segment */
  shm_fd = shm_open(SHM_SEGMENT_NAME, O_CREAT | O_EXCL | O_RDWR, 0666);

  if (shm_fd > 0) {
    /* succeeded: expand it to the desired size (Note: dont't do "this every time because ftruncate fills it with zeros) */
    printf ("Creating shared memory and setting size=%d\n",
    SHM_SEGMENT_SIZE);

    if (ftruncate(shm_fd, SHM_SEGMENT_SIZE) < 0) {
      perror("ftruncate");
      exit(1);
    }
    /* Create a semaphore as well */
    demo_sem = sem_open(SEMA_NAME, O_RDWR | O_CREAT, 0666, 1);

    if (demo_sem == SEM_FAILED)
      perror("sem_open failed\n");
  }
  else if (shm_fd == -1 && errno == EEXIST) {
    /* Already exists: open again without O_CREAT */
    shm_fd = shm_open(SHM_SEGMENT_NAME, O_RDWR, 0);
    demo_sem = sem_open(SEMA_NAME, O_RDWR);

    if (demo_sem == SEM_FAILED)
      perror("sem_open failed\n");
  }

  if (shm_fd == -1) {
    perror("shm_open " SHM_SEGMENT_NAME);
    exit(1);
  }
  /* Map the shared memory */
  shm_p = mmap(NULL, SHM_SEGMENT_SIZE, PROT_READ | PROT_WRITE,
    MAP_SHARED, shm_fd, 0);

  if (shm_p == NULL) {
    perror("mmap");
    exit(1);
  }
  return shm_p;
}
int main(int argc, char *argv[])
{
  char *shm_p;
  printf("%s PID=%d\n", argv[0], getpid());
  shm_p = get_shared_memory();

  while (1) {
    printf("Press enter to see the current contents of shm\n");
    getchar();
    sem_wait(demo_sem);
    printf("%s\n", shm_p);
    /* Write our signature to the shared memory */
    sprintf(shm_p, "Hello from process %d\n", getpid());
    sem_post(demo_sem);
  }
  return 0;
}

The memory in Linux is taken from a tmpfs filesystem mounted in /dev/shm or /run/shm.