multiprocessing — Process-based parallelism

The multiprocessing module allows you to write concurrent programs that bypass the Global Interpreter Lock (GIL) by using subprocesses instead of threads. This makes it ideal for CPU-bound tasks.

import multiprocessing

Basic Process

import multiprocessing
import time

def calculate_square(number):
    print(f"Square of {number}: {number * number}")
    time.sleep(1)

# MUST be protected with if __name__ == '__main__': (especially on Windows)
if __name__ == '__main__':
    # 1. Create a process
    p = multiprocessing.Process(target=calculate_square, args=(10,))
    
    # 2. Start the process
    p.start()
    print("Main process doing other things...")
    
    # 3. Wait for process to finish
    p.join()
    print("Done!")

The Pool Class (Recommended)

For executing functions concurrently across multiple CPU cores, Pool is the easiest and most effective way.

import multiprocessing
import time

def heavy_computation(x):
    # Simulate heavy work
    time.sleep(1)
    return x * x

if __name__ == '__main__':
    data = [1, 2, 3, 4, 5, 6, 7, 8]
    
    start = time.time()
    
    # Automatically creates as many processes as you have CPU cores
    with multiprocessing.Pool() as pool:
        # Blocks until all processes finish
        results = pool.map(heavy_computation, data)
        
    print(results)  # [1, 4, 9, 16, 25, 36, 49, 64]
    print(f"Time taken: {time.time() - start:.2f} seconds")

Sharing State: Queue

Because processes have independent memory spaces, you cannot share standard global variables. You must use multiprocessing.Queue to send data between them.

import multiprocessing

def square_list(numbers, q):
    for n in numbers:
        q.put(n * n)

if __name__ == '__main__':
    numbers = [2, 3, 4]
    q = multiprocessing.Queue()
    
    p = multiprocessing.Process(target=square_list, args=(numbers, q))
    p.start()
    p.join()
    
    while not q.empty():
        print(q.get()) # Prints 4, 9, 16

ProcessPoolExecutor (Modern Approach)

Introduced in concurrent.futures, this is the modern equivalent to Pool.

import concurrent.futures
import math

def is_prime(n):
    if n < 2: return False
    for i in range(2, int(math.sqrt(n)) + 1):
        if n % i == 0: return False
    return True

if __name__ == '__main__':
    numbers = [10**12 + 39, 10**12 + 61, 10**12 + 63, 10**12 + 91]
    
    with concurrent.futures.ProcessPoolExecutor() as executor:
        results = list(executor.map(is_prime, numbers))
        
    print(results)

Common Pitfalls

Missing if __name__ == '__main__': — This is strictly required on Windows to prevent recursive child process creation.
Passing unpicklable arguments — Data sent between processes must be picklable (no open files, sockets, etc.).
Overhead — Starting a process takes significant time/memory. Use it only for slow tasks where the computation takes longer than the process creation overhead.

Official Documentation

multiprocessing — Process-based parallelism

API Reference

Process and Pool

Class	Description
`multiprocessing.Process`	Process objects represent activity that is run in a separate process. Takes `target` and `args`.
`multiprocessing.Pool`	A process pool object which controls a pool of worker processes to which jobs can be submitted.

Pool Methods

Method	Description
`Pool.map(func, iterable)`	A parallel equivalent of the map() built-in function. Blocks until result is ready.
`Pool.apply_async(func, args)`	Execution of func happens asynchronously without blocking.
`Pool.close()`	Prevents any more tasks from being submitted to the pool.
`Pool.join()`	Wait for the worker processes to exit. Must call `close()` or `terminate()` first.

Inter-process Communication

Class	Description
`multiprocessing.Queue`	A thread and process safe FIFO queue.
`multiprocessing.Pipe`	Returns a pair `(conn1, conn2)` representing the ends of a pipe.
`multiprocessing.Lock`	A non-recursive lock object.