Multi-processing in Python

Overview

Sometimes Python is slow. By default Python runs on a single thread. To increase the speed of processing in Python, code can run on multiple processes. This parallelization allows for the distribution of work across all CPU cores When running on multiple cores, long running jobs can be broken down into smaller manageable chunks. Once the individual jobs are run in parallel the results are returned, and the time to process has been cut down drastically. Multi-processing in Python is effective to speed up the processing time for long-running functions.

Multi-processing

Python has multiprocessing built into the language. With a simple import statement:

import multiprocessing

Pool

The multiprocessing includes Pool class, which allows for creation of a pool of workers. Once the pool is allocated we then have a bunch of worker threads that can processing in parallel. This usually looks like the code below:

number_of_workers = 10
with Pool(number_of_workers) as p:
# Do something with pool here

Map

Now that the pool is allocated, work may be done in parallel. Using map we can break apart a job into multiple processes at the same time. In the example below, we use multiprocessing to square and print a large array of numbers in parallel.

def do_something(number):
return number ** 2
array_of_numbers = [x for x in range(0, 100000000000)]
with Pool(number_of_workers) as p:
print(p.map(do_something, array_of_numbers))

Imap

A more optimized method is imap . This method doesn’t duplicate the memory space of the original Python process to different workers. The imap returns an iterator instead of a completed sequence, thus using less memory.

def do_something(number):
return number ** 2
array_of_numbers = [x for x in range(0, 100000000000)]
with Pool(number_of_workers) as p:
print(p.imap(do_something, array_of_numbers))

Starmap

Another function starmap is identical to map in functionality in terms of memory usage. The difference is that starmap allows for multiple arguments.

def do_something(number, another_number):
return number ** 2 + another_number ** 2)
array_of_number_tuple = [(x, x + 1) for x in range(0, 100000000000)]
with Pool(number_of_workers) as p:
print(p.starmap(do_something, array_of_number_tuple))

Conclusion

The Python package multiprocessing allows for faster execution of long-running jobs. There are more complex ways to use the package that aren’t detailed in this post which can be read about further at the Python documentation page. Using tools from the multiprocessing library, you can cut down your processing time from days to hours.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store