Multiprocessing pools

In general, there is no reason to have more processes than there are processors on the computer. There are a few reasons for this:

Only cpu_count() processes can run simultaneously
Each process consumes resources with a full copy of the Python interpreter
Communication between processes is expensive
Creating processes takes a non-zero amount of time

Given these constraints, it makes sense to create at most cpu_count() processes when the program starts and then have them execute tasks as needed. This has much less overhead than starting a new process for each task.

It is not difficult to implement a basic series of communicating processes that does this, but it can be tricky to debug, test, and get right. Of course, other Python developers have already done it for us in the form of multiprocessing pools.

Pools abstract away the overhead of figuring out what code is executing in the main process and which code is running in the subprocess. The pool abstraction restricts the number of places in which code in different processes interacts, making it much easier to keep track of.

Unlike threads, multiprocessing cannot directly access variables set up by other threads. Multiprocessing provides a few different ways to implement interprocess communication. Pools seamlessly hide the process of passing data between processes. Using a pool looks much like a function call: you pass data into a function, it is executed in another process or processes, and when the work is done, a value is returned. It is important to understand that under the hood, a lot of work is being done to support this: objects in one process are being pickled and passed into an operating system process pipe. Then, another process retrieves data from the pipe and unpickles it. The requested work is done in the subprocess and a result is produced. The result is pickled and passed back through the pipe. Eventually, the original process unpickles and returns it.

All this pickling and passing data into pipes takes time and memory. Therefore, it is ideal to keep the amount and size of data passed into and returned from the pool to a minimum, and it is only advantageous to use the pool if a lot of processing has to be done on the data in question.

Pickling is an expensive operation for even medium-sized Python operations. It is frequently more expensive to pickle a large object for use in a separate process than it would be to do the work in the original process using threads. Make sure you profile your program to ensure the overhead of multiprocessing is actually worth the overhead of implementing and maintaining it.

Armed with this knowledge, the code to make all this machinery work is surprisingly simple. Let's look at the problem of calculating all the prime factors of a list of random numbers. This is a common and expensive part of a variety of cryptography algorithms (not to mention attacks on those algorithms!). It requires years of processing power to crack the extremely large numbers used to secure your bank accounts. The following implementation, while readable, is not at all efficient, but that's okay because we want to see it using lots of CPU time:

import random
from multiprocessing.pool import Pool


def prime_factor(value):
    factors = []
    for divisor in range(2, value - 1):
        quotient, remainder = divmod(value, divisor)
        if not remainder:
            factors.extend(prime_factor(divisor))
            factors.extend(prime_factor(quotient))
            break
    else:
        factors = [value]
    return factors


if __name__ == "__main__":
    pool = Pool()

    to_factor = [random.randint(100000, 50000000) for i in range(20)]
    results = pool.map(prime_factor, to_factor)
    for value, factors in zip(to_factor, results):
        print("The factors of {} are {}".format(value, factors))

Let's focus on the parallel processing aspects, as the brute force recursive algorithm for calculating factors is pretty clear. We first construct a multiprocessing pool instance. By default, this pool creates a separate process for each of the CPU cores in the machine it is running on.

The map method accepts a function and an iterable. The pool pickles each of the values in the iterable and passes it into an available process, which executes the function on it. When that process is finished doing its work, it pickles the resulting list of factors and passes it back to the pool. Then, if the pool has more work available, it takes on the next job.

Once all the pools are finished processing work (which could take some time), the results list is passed back to the original process, which has been waiting patiently for all this work to complete.

It is often more useful to use the similar map_async method, which returns immediately even though the processes are still working. In that case, the results variable would not be a list of values, but a promise to return a list of values later by calling results.get(). This promise object also has methods such as ready() and wait(), which allow us to check whether all the results are in yet. I'll leave you to the Python documentation to discover more about their usage.

Alternatively, if we don't know all the values we want to get results for in advance, we can use the apply_async method to queue up a single job. If the pool has a process that isn't already working, it will start immediately; otherwise, it will hold onto the task until there is a free process available.

Pools can also be closed, which refuses to take any further tasks, but processes everything currently in the queue, or terminated, which goes one step further and refuses to start any jobs still in the queue, although any jobs currently running are still permitted to complete.