Python Threading and the GIL Explained — Threads vs Multiprocessing

The GIL, explained

No Python concurrency conversation gets far before the Global Interpreter Lock comes up, and few topics are as widely misunderstood. The GIL explains why adding threads to a number-crunching program makes it no faster (sometimes slower), yet threads remain the right tool for network and disk work. This guide makes the model concrete and shows when to choose threads, processes, or async.

What the GIL is

The GIL is a single lock in CPython that allows only one thread to execute Python bytecode at a time. Even on an 8-core machine, your Python threads take turns holding the GIL — so pure-Python code runs effectively one thread at a time.

import threading

def crunch():
    total = 0
    for _ in range(10_000_000):
        total += 1          # pure-Python bytecode — needs the GIL

# Two threads do NOT run this in parallel; they alternate holding the GIL.
t1 = threading.Thread(target=crunch)
t2 = threading.Thread(target=crunch)

It exists because CPython's memory management (reference counting) isn't thread-safe. A single global lock is the simplest way to keep refcounts correct, and it makes single-threaded code fast and C extensions easy to write. It's a CPython implementation detail, not part of the language — Jython and (early) IronPython have no GIL.

Why threads don't speed up CPU-bound work

Because only one thread runs Python bytecode at once, splitting a CPU-bound task across threads gives no speedup — the threads serialise on the GIL, and you even pay extra for lock contention and context switching.

# A CPU-bound job across 4 threads runs about the same as 1 thread
# (often slightly slower) because the GIL serialises them.

For CPU-bound work you need true parallelism, which means separate processes — each process has its own interpreter and its own GIL.

Where threads do help: I/O-bound work

The crucial detail: the GIL is released during blocking I/O — network requests, disk reads, time.sleep, database calls. While one thread waits on a socket, another can run. So for I/O-bound programs, threads deliver real concurrency.

import threading, urllib.request

def fetch(url):
    with urllib.request.urlopen(url) as r:   # GIL released while waiting on the network
        return r.read()

threads = [threading.Thread(target=fetch, args=(u,)) for u in urls]
for t in threads: t.start()
for t in threads: t.join()

Rule of thumb: threads for I/O-bound, processes for CPU-bound. The GIL only bites when threads are fighting over Python bytecode, not when they're parked waiting on the outside world.

Race conditions and locks

The GIL does not make your code automatically thread-safe. A statement like counter += 1 is several bytecode steps (read, add, store), and a thread switch in the middle leads to lost updates — a classic race condition.

import threading

counter = 0
lock = threading.Lock()

def increment():
    global counter
    for _ in range(100_000):
        with lock:           # serialise the read-modify-write
            counter += 1

Protect shared mutable state with a Lock (or use thread-safe primitives like queue.Queue). Without the lock, two threads can read the same value, both add one, and one increment vanishes.

Multiprocessing: real parallelism

multiprocessing sidesteps the GIL by running separate processes, each with its own interpreter and memory. That gives genuine multi-core parallelism for CPU-bound work — at the cost of process startup and pickling data to pass between processes.

from multiprocessing import Pool

def square(n):
    return n * n

if __name__ == "__main__":           # required guard on Windows/spawn
    with Pool(4) as pool:
        print(pool.map(square, range(10)))   # runs across 4 real cores

Inter-process communication isn't free (arguments and results are serialised), so it pays off when the computation per task clearly outweighs the messaging overhead.

Picking a tool with concurrent.futures

concurrent.futures gives both models the same high-level API, so you can match the tool to the workload by swapping one class.

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

# I/O-bound: threads
with ThreadPoolExecutor(max_workers=8) as ex:
    results = list(ex.map(fetch, urls))

# CPU-bound: processes
with ProcessPoolExecutor(max_workers=4) as ex:
    results = list(ex.map(square, big_numbers))

For high-concurrency I/O specifically, asyncio is a third option — single-threaded cooperative concurrency that scales to thousands of connections without thread overhead.

Recap

The GIL lets only one thread run Python bytecode at a time, which is why threads give no speedup for CPU-bound code — but the lock is released during blocking I/O, so threads shine for I/O-bound work. The GIL doesn't make code thread-safe: guard shared state with a Lock. For true multi-core parallelism use multiprocessing (separate interpreters, at the cost of pickling), and lean on concurrent.futures to switch between thread and process pools with one line. Threads for I/O, processes for CPU, async for massive I/O concurrency — and the GIL stops being mysterious.

Practice tests are coming soon