Now it is time to look at multi-threaded processes. The programming interface for threads is the POSIX threads API, which was first defined in IEEE POSIX 1003.1c standard (1995), commonly known as Pthreads. It was implemented as an additional part of the C library, libpthread.so
. There have been two versions of Pthreads over the last 15 years or so, Linux Threads and the Native POSIX Thread Library (NPTL). The latter is much more compliant with the specification, particularly with regard to the handling of signals and process IDs. It is pretty dominant now, but you may come across some older versions of uClibc that use Linux Threads.
The function to create a thread is pthread_create(3)
:
int pthread_create(pthread_t *thread, const pthread_attr_t *attr, void *(*start_routine) (void *), void *arg);
It creates a new thread of execution which begins at the function start_routine
and places a descriptor in pthread_t
pointed to by thread
. It inherits the scheduling parameters of the calling thread but these can be overridden by passing a pointer to the thread attributes in attr
. The thread will begin to execute immediately.
pthread_t
is the main way to refer to the thread within the program but the thread can also be seen from outside using a command like ps -eLf
:
UID PID PPID LWP C NLWP STIME TTY TIME CMD ... chris 6072 5648 6072 0 3 21:18 pts/0 00:00:00 ./thread-demo chris 6072 5648 6073 0 3 21:18 pts/0 00:00:00 ./thread-demo
The program thread-demo
has two threads. The PID
and PPID
columns show that they all belong to the same process and have the same parent, as you would expect. The column marked LWP
is interesting, though. LWP
stands for Light Weight Process which, in this context, is another name for thread. The numbers in that column are also known as Thread IDs or TIDs. In the main thread, the TID is the same as the PID, but for the others it is a different (higher) value. Some functions will accept a TID in places where the documentation states that you must give a PID, but be aware that this behavior is specific to Linux and not portable. Here is the code for thread-demo
:
#include <stdio.h> #include <unistd.h> #include <pthread.h> #include <sys/syscall.h> static void *thread_fn(void *arg) { printf("New thread started, PID %d TID %d\n", getpid(), (pid_t)syscall(SYS_gettid)); sleep(10); printf("New thread terminating\n"); return NULL; } int main(int argc, char *argv[]) { pthread_t t; printf("Main thread, PID %d TID %d\n", getpid(), (pid_t)syscall(SYS_gettid)); pthread_create(&t, NULL, thread_fn, NULL); pthread_join(t, NULL); return 0; }
There is a man page for getttid(2)
which explains that you have to make the Linux syscall
directly because there isn't a C library wrapper for it, as shown.
There is a limit to the total number of threads that a given kernel can schedule. The limit scales according to the size of the system from around 1,000 on small devices up to tens of thousands on larger embedded devices. The actual number is available in /proc/sys/kernel/threads-max
. Once you reach this limit, fork()
and pthread_create()
will fail.
A thread terminates when:
start_routine
pthread_exit(3)
pthread_cancel(3)
exit(3)
, or the process receiving a signal that is not handled, masked or ignoredNote that, if a multi threaded program calls fork(2)
, only the thread that made the call will exist in the new child process. Fork does not replicate all threads.
A thread has a return value, which is a void pointer. One thread can wait for another to terminate and collect its return value by calling pthread_join(2)
. There is an example in the code for thread-demo
mentioned in the preceding section. This produces a problem that is very similar to the zombie problem among processes: the resources of the thread, for example, the stack, cannot be freed up until another thread has joined with it. If threads remain unjoined there is a resource leak in the program.
The support for POSIX threads is part of the C library, in the library libpthread.so
. However, there is more to building programs with threads than linking the library: there have to be changes to the way the compiler generates code to make sure that certain global variables, such as errno
, have one instance per thread rather than one for the whole process.
The big advantage of threads is that they share the address space and so can share memory variables. This is also a big disadvantage because it requires synchronization to preserve data consistency, in a similar way to memory segments shared between processes but with the proviso that, with threads, all memory is shared. Threads can create private memory using thread local storage (TLS).
The pthreads
interface provides the basics necessary to achieve synchronization: mutexes and condition variables. If you want more complex structures, you will have to build them yourself.
It is worth noting that all of the IPC methods described earlier work equally well between threads in the same process.
To write robust programs, you need to protect each shared resource with a mutex lock and make sure that every code path that reads or writes the resource has locked the mutex first. If you apply this rule consistently, most of the problems should be solved. The ones that remain are associated with the fundamental behavior of mutexes. I will list them briefly here, but will not go into detail:
Cooperating threads need a method of alerting one another that something has changed and needs attention. That thing is called a condition and the alert is sent through a condition variable, condvar
.
A condition is just something that you can test to give a true
or false
result. A simple example is a buffer that contains either zero or some items. One thread takes items from the buffer and sleeps when it is empty. Another thread places items into the buffer and signals the other thread that it has done so, because the condition that the other thread is waiting on has changed. If it is sleeping, it needs to wake up and do something. The only complexity is that the condition is, by definition, a shared resource and so has to be protected by a mutex. Here is a simple example which follows the producer-consumer relationship described in the preceding section:
pthread_cond_t cv = PTHREAD_COND_INITIALIZER; pthread_mutex_t mutx = PTHREAD_MUTEX_INITIALIZER; void *consumer(void *arg) { while (1) { pthread_mutex_lock(&mutx); while (buffer_empty(data)) pthread_cond_wait(&cv, &mutx); /* Got data: take from buffer */ pthread_mutex_unlock(&mutx); /* Process data item */ } return NULL; } void *producer(void *arg) { while (1) { /* Produce an item of data */ pthread_mutex_lock(&mutx); add_data(data); pthread_mutex_unlock(&mutx); pthread_cond_signal(&cv); } return NULL; }
Note that, when the consumer thread blocks on the condvar
, it does so while holding a locked mutex, which would seem to be a recipe for deadlock the next time the producer thread tries to update the condition. To avoid this, pthread_condwait(3)
unlocks the mutex after the thread is blocked and locks it again before waking it and returning from the wait.
Now that we have covered the basics of processes and threads and the ways in which they communicate, it is time to see what we can do with them.
Here are some of the rules I use when building systems:
Minimize overheads by keeping closely inter-operating threads together in one process.
On the other hand, try and keep components with limited interaction in separate processes, in the interests of resilience and modularity.
This is an amplification of Rule 2: the critical part of the system, which might be the machine control program, should be kept as simple as possible and written in a more rigorous way than other parts. It must be able to continue even if other processes fail. If you have real-time threads, they, by definition, must be critical and should go into a process by themselves.
One of the temptations when writing a multi-threaded program is to intermingle the code and variables between threads because it is all in one program and easy to do. Don't keep threads modular with well-defined interactions.
It is very easy to create additional threads but there is a cost, not least in the additional synchronization necessary to coordinate their activities.
Threads can run simultaneously on a multi-core processor, giving higher throughput. If you have a large computing job, you can create one thread per core and make maximum use of the hardware. There are libraries to help you do this, such as OpenMP. You probably shouldn't be coding parallel programming algorithms from scratch.
The Android design is a good illustration. Each application is a separate Linux process which helps to modularize memory management but especially ensures that one app crashing does not affect the whole system. The process model is also used for access control: a process can only access the files and resources which its UID and GIDs allow it to. There are a group of threads in each process. There is one to manage and update the user interface, one for handling signals from the operating system, several for managing dynamic memory allocation and the freeing up of Java objects and a worker pool of at least two threads for receiving messages from other parts of the system using the Binder protocol.
To summarize, processes provide resilience because each process has a protected memory space and, when the process terminates, all resources including memory and file descriptors are freed up, reducing resource leaks. On the other hand, threads share resources and so can communicate easily through shared variables, and can cooperate by sharing access to files and other resources. Threads give parallelism through worker pools and other abstractions which is useful on multi-core processors.