Python Multiprocessing Explained — Escaping the GIL, Process Pools, and Sharing Data

Python multiprocessing, explained

The GIL stops Python threads from running CPU-bound code in parallel. multiprocessing sidesteps it entirely by running code in separate processes, each with its own interpreter and its own GIL. That buys real parallelism on multiple cores — at the cost of not sharing memory, so data must be explicitly passed around.

Why processes instead of threads

A thread shares the parent's memory and GIL; a process is an independent OS process with its own memory space. For CPU-bound work, processes win because the OS schedules them across cores simultaneously.

from multiprocessing import Process
import os

def worker(n):
    print(f"worker {n} in pid {os.getpid()}")

if __name__ == "__main__":          # required guard, see below
    procs = [Process(target=worker, args=(i,)) for i in range(3)]
    for p in procs: p.start()
    for p in procs: p.join()        # wait for all to finish

Each worker prints a different PID — these are genuinely separate processes running in parallel.

The main guard is mandatory

On Windows and macOS the default start method is spawn: a fresh interpreter is launched that re-imports your module. Without if __name__ == "__main__":, that re-import would spawn processes recursively. Always guard the entry point — this is the single most common multiprocessing bug.

Pool is the workhorse

For "apply this function to many inputs," Pool manages a fixed set of worker processes and distributes the work. map returns results in input order.

from multiprocessing import Pool

def heavy(n):
    return sum(i * i for i in range(n))

if __name__ == "__main__":
    with Pool(processes=4) as pool:
        results = pool.map(heavy, [10**6, 2*10**6, 3*10**6])
    print(results)

Pool also offers imap (lazy, ordered), imap_unordered (yields as they finish), and apply_async for single calls with callbacks.

Everything crosses the boundary via pickling

Because processes don't share memory, arguments and return values are pickled, sent through a pipe, and unpickled on the other side. This has consequences:

# Fails — lambdas and local functions can't be pickled
pool.map(lambda n: n*2, data)        # PicklingError

# Works — top-level functions are picklable
def double(n): return n * 2
pool.map(double, data)

Pickling also means large arguments are expensive to send, and the target function must be importable at top level. Keep data passed to workers small.

When workers need to communicate, use a process-safe Queue for message passing, or shared_memory / Value / Array for shared state.

from multiprocessing import Process, Queue

def producer(q):
    for i in range(5):
        q.put(i)

if __name__ == "__main__":
    q = Queue()
    p = Process(target=producer, args=(q,))
    p.start(); p.join()
    while not q.empty():
        print(q.get())

For numeric arrays shared without copying, multiprocessing.shared_memory.SharedMemory (3.8+) lets processes view the same buffer — ideal for large NumPy arrays.

When multiprocessing is worth it

It pays off for CPU-bound tasks big enough to dwarf the process startup and pickling overhead: image processing, simulations, number crunching. For I/O-bound work, threads or asyncio are lighter. And for short tasks, the overhead of spawning processes and shipping data can make multiprocessing slower than a plain loop.

Recap

multiprocessing achieves true parallelism by running code in separate processes, each with its own interpreter and GIL — the way around the GIL for CPU-bound work. Always wrap the entry point in if __name__ == "__main__": because of the spawn start method. Use Pool.map to fan work across workers, remember that all arguments and results are pickled (so no lambdas, keep payloads small), and share state through Queue or shared_memory. Reach for it when CPU work outweighs the startup and serialisation cost.

More ways to practice