multiprocessing — Process-based parallelism

The multiprocessing module allows you to write concurrent programs that bypass the Global Interpreter Lock (GIL) by using subprocesses instead of threads. This makes it ideal for CPU-bound tasks.

import multiprocessing

Basic Process

import multiprocessing
import time

def calculate_square(number):
    print(f"Square of {number}: {number * number}")
    time.sleep(1)

# MUST be protected with if __name__ == '__main__': (especially on Windows)
if __name__ == '__main__':
    # 1. Create a process
    p = multiprocessing.Process(target=calculate_square, args=(10,))
    
    # 2. Start the process
    p.start()
    print("Main process doing other things...")
    
    # 3. Wait for process to finish
    p.join()
    print("Done!")

The Pool Class (Recommended)

For executing functions concurrently across multiple CPU cores, Pool is the easiest and most effective way.

import multiprocessing
import time

def heavy_computation(x):
    # Simulate heavy work
    time.sleep(1)
    return x * x

if __name__ == '__main__':
    data = [1, 2, 3, 4, 5, 6, 7, 8]
    
    start = time.time()
    
    # Automatically creates as many processes as you have CPU cores
    with multiprocessing.Pool() as pool:
        # Blocks until all processes finish
        results = pool.map(heavy_computation, data)
        
    print(results)  # [1, 4, 9, 16, 25, 36, 49, 64]
    print(f"Time taken: {time.time() - start:.2f} seconds")

Sharing State: Queue

Because processes have independent memory spaces, you cannot share standard global variables. You must use multiprocessing.Queue to send data between them.

import multiprocessing

def square_list(numbers, q):
    for n in numbers:
        q.put(n * n)

if __name__ == '__main__':
    numbers = [2, 3, 4]
    q = multiprocessing.Queue()
    
    p = multiprocessing.Process(target=square_list, args=(numbers, q))
    p.start()
    p.join()
    
    while not q.empty():
        print(q.get()) # Prints 4, 9, 16

ProcessPoolExecutor (Modern Approach)

Introduced in concurrent.futures, this is the modern equivalent to Pool.

import concurrent.futures
import math

def is_prime(n):
    if n < 2: return False
    for i in range(2, int(math.sqrt(n)) + 1):
        if n % i == 0: return False
    return True

if __name__ == '__main__':
    numbers = [10**12 + 39, 10**12 + 61, 10**12 + 63, 10**12 + 91]
    
    with concurrent.futures.ProcessPoolExecutor() as executor:
        results = list(executor.map(is_prime, numbers))
        
    print(results)

Common Pitfalls

Official Documentation

multiprocessing — Process-based parallelism

API Reference

Process and Pool

Class Description
multiprocessing.Process Process objects represent activity that is run in a separate process. Takes target and args.
multiprocessing.Pool A process pool object which controls a pool of worker processes to which jobs can be submitted.

Pool Methods

Method Description
Pool.map(func, iterable) A parallel equivalent of the map() built-in function. Blocks until result is ready.
Pool.apply_async(func, args) Execution of func happens asynchronously without blocking.
Pool.close() Prevents any more tasks from being submitted to the pool.
Pool.join() Wait for the worker processes to exit. Must call close() or terminate() first.

Inter-process Communication

Class Description
multiprocessing.Queue A thread and process safe FIFO queue.
multiprocessing.Pipe Returns a pair (conn1, conn2) representing the ends of a pipe.
multiprocessing.Lock A non-recursive lock object.