The GIL, explained
No Python concurrency conversation gets far before the Global Interpreter Lock comes up, and few topics are as widely misunderstood. The GIL explains why adding threads to a number-crunching program makes it no faster (sometimes slower), yet threads remain the right tool for network and disk work. This guide makes the model concrete and shows when to choose threads, processes, or async.
What the GIL is
The GIL is a single lock in CPython that allows only one thread to execute Python bytecode at a time. Even on an 8-core machine, your Python threads take turns holding the GIL — so pure-Python code runs effectively one thread at a time.
import threading
def crunch():
total = 0
for _ in range(10_000_000):
total += 1 # pure-Python bytecode — needs the GIL
# Two threads do NOT run this in parallel; they alternate holding the GIL.
t1 = threading.Thread(target=crunch)
t2 = threading.Thread(target=crunch)
It exists because CPython's memory management (reference counting) isn't thread-safe. A single global lock is the simplest way to keep refcounts correct, and it makes single-threaded code fast and C extensions easy to write. It's a CPython implementation detail, not part of the language — Jython and (early) IronPython have no GIL.
Why threads don't speed up CPU-bound work
Because only one thread runs Python bytecode at once, splitting a CPU-bound task across threads gives no speedup — the threads serialise on the GIL, and you even pay extra for lock contention and context switching.
# A CPU-bound job across 4 threads runs about the same as 1 thread
# (often slightly slower) because the GIL serialises them.
For CPU-bound work you need true parallelism, which means separate processes — each process has its own interpreter and its own GIL.
Where threads do help: I/O-bound work
The crucial detail: the GIL is released during blocking I/O — network requests, disk
reads, time.sleep, database calls. While one thread waits on a socket, another can run.
So for I/O-bound programs, threads deliver real concurrency.
import threading, urllib.request
def fetch(url):
with urllib.request.urlopen(url) as r: # GIL released while waiting on the network
return r.read()
threads = [threading.Thread(target=fetch, args=(u,)) for u in urls]
for t in threads: t.start()
for t in threads: t.join()
Rule of thumb: threads for I/O-bound, processes for CPU-bound. The GIL only bites when threads are fighting over Python bytecode, not when they're parked waiting on the outside world.
Race conditions and locks
The GIL does not make your code automatically thread-safe. A statement like
counter += 1 is several bytecode steps (read, add, store), and a thread switch in the
middle leads to lost updates — a classic race condition.
import threading
counter = 0
lock = threading.Lock()
def increment():
global counter
for _ in range(100_000):
with lock: # serialise the read-modify-write
counter += 1
Protect shared mutable state with a Lock (or use thread-safe primitives like
queue.Queue). Without the lock, two threads can read the same value, both add one, and
one increment vanishes.
Multiprocessing: real parallelism
multiprocessing sidesteps the GIL by running separate processes, each with its own
interpreter and memory. That gives genuine multi-core parallelism for CPU-bound work — at
the cost of process startup and pickling data to pass between processes.
from multiprocessing import Pool
def square(n):
return n * n
if __name__ == "__main__": # required guard on Windows/spawn
with Pool(4) as pool:
print(pool.map(square, range(10))) # runs across 4 real cores
Inter-process communication isn't free (arguments and results are serialised), so it pays off when the computation per task clearly outweighs the messaging overhead.
Picking a tool with concurrent.futures
concurrent.futures gives both models the same high-level API, so you can match the tool
to the workload by swapping one class.
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
# I/O-bound: threads
with ThreadPoolExecutor(max_workers=8) as ex:
results = list(ex.map(fetch, urls))
# CPU-bound: processes
with ProcessPoolExecutor(max_workers=4) as ex:
results = list(ex.map(square, big_numbers))
For high-concurrency I/O specifically, asyncio is a third option — single-threaded cooperative concurrency that scales to thousands of connections without thread overhead.
Recap
The GIL lets only one thread run Python bytecode at a time, which is why threads give
no speedup for CPU-bound code — but the lock is released during blocking I/O, so
threads shine for I/O-bound work. The GIL doesn't make code thread-safe: guard shared
state with a Lock. For true multi-core parallelism use multiprocessing (separate
interpreters, at the cost of pickling), and lean on concurrent.futures to switch between
thread and process pools with one line. Threads for I/O, processes for CPU, async for
massive I/O concurrency — and the GIL stops being mysterious.