Multi-processing in Python

Camron Godbout
3 min readMar 14, 2018

--

Useful tips to eke out all possible performance in Python using multiprocessing library

Overview

Sometimes Python is slow. By default Python runs on a single thread. To increase the speed of processing in Python, code can run on multiple processes. This parallelization allows for the distribution of work across all CPU cores When running on multiple cores, long running jobs can be broken down into smaller manageable chunks. Once the individual jobs are run in parallel the results are returned, and the time to process has been cut down drastically. Multi-processing in Python is effective to speed up the processing time for long-running functions.

Multi-processing

Python has multiprocessing built into the language. With a simple import statement:

import multiprocessing

we have the capability to run different functions in parallel. This package contains multiple strategies to improve speed by running functions in parallel. In this post we will highlight some fast and easy implementations using multiprocessing to speed up long-running code.

Pool

The multiprocessing includes Pool class, which allows for creation of a pool of workers. Once the pool is allocated we then have a bunch of worker threads that can processing in parallel. This usually looks like the code below:

number_of_workers = 10
with Pool(number_of_workers) as p:
# Do something with pool here

Now that the pool is allocated, workers can be given tasks.

Map

Now that the pool is allocated, work may be done in parallel. Using map we can break apart a job into multiple processes at the same time. In the example below, we use multiprocessing to square and print a large array of numbers in parallel.

def do_something(number):
return number ** 2
array_of_numbers = [x for x in range(0, 100000000000)]
with Pool(number_of_workers) as p:
print(p.map(do_something, array_of_numbers))

If this was to be done serially (without parallelization), it would take quite some time. By taking this job and splitting it into pieces we can share it among the different CPU cores to speed up the task.

Under the hood, map takes the current Python process, pickles it, sends to another CPU core. Sometimes this nuance leads to issues. For example, if the current process size in memory is 4GB and the code is using Pool(4) on a four core machine, that 4GB Python process will be pickled and sent to 4 workers. This can increase the memory usage by up to 4GB * 4 workers = 16GB.

Imap

A more optimized method is imap . This method doesn’t duplicate the memory space of the original Python process to different workers. The imap returns an iterator instead of a completed sequence, thus using less memory.

def do_something(number):
return number ** 2
array_of_numbers = [x for x in range(0, 100000000000)]
with Pool(number_of_workers) as p:
print(p.imap(do_something, array_of_numbers))

The outcome of using imap is identical to map, but reduces memory usage.

One thing to note is that imap and map can only pass one parameter to the function to be parallelized.

Starmap

Another function starmap is identical to map in functionality in terms of memory usage. The difference is that starmap allows for multiple arguments.

def do_something(number, another_number):
return number ** 2 + another_number ** 2)
array_of_number_tuple = [(x, x + 1) for x in range(0, 100000000000)]
with Pool(number_of_workers) as p:
print(p.starmap(do_something, array_of_number_tuple))

In the code example above, we show how starmap differs from map and imap . Instead of a single parameter, multiple parameters are passed to the function that is being ran in parallel.

Conclusion

The Python package multiprocessing allows for faster execution of long-running jobs. There are more complex ways to use the package that aren’t detailed in this post which can be read about further at the Python documentation page. Using tools from the multiprocessing library, you can cut down your processing time from days to hours.

Finally, if using Python is exciting and getting the most out of your code sounds like fun, we’re always looking for experienced Python developers at Apteo. Feel free to reach out at info@apteo.co

--

--