One problem, three different solutions
Python has three concurrency models — threading, multiprocessing, and asyncio — and which one you pick determines whether your program gets faster or just gets more complicated. The choice is not arbitrary: each model solves a different bottleneck. Get it wrong and you'll add overhead without gaining any throughput. The key to choosing correctly is understanding two concepts: the GIL and the I/O-bound vs CPU-bound distinction.
The GIL — why threading isn't always parallel
CPython's Global Interpreter Lock (GIL) is a mutex that allows only one thread to execute Python bytecode at a time. On a four-core machine, a multithreaded Python program still runs its bytecode on one core at a time. Threads take turns; they don't run simultaneously for CPU work.
The GIL is released around blocking I/O (network calls, file reads, sleep) — so
multiple threads can overlap those waits. But for pure Python number-crunching, threads
can't use more than one core.
import threading
def burn_cpu():
total = 0
for _ in range(10_000_000):
total += 1 # holds the GIL the whole time
t1 = threading.Thread(target=burn_cpu)
t2 = threading.Thread(target=burn_cpu)
t1.start(); t2.start()
t1.join(); t2.join()
# Takes ~same time as running sequentially — GIL serialises bytecode
This is why the first question to answer before picking a model is: what is the bottleneck?
Quick-reference comparison
threading | multiprocessing | asyncio | |
|---|---|---|---|
| Parallelism | No (GIL) | Yes — one process per CPU core | No — cooperative, single-threaded |
| Best for | I/O-bound (many slow calls) | CPU-bound (number crunching, image processing) | I/O-bound (many concurrent connections) |
| Memory model | Shared | Separate per process | Shared (single thread) |
| Communication | Easy (shared objects, locks) | Explicit (Queue, Pipe, Value) — must pickle | await, queues, asyncio.gather |
| Overhead | Light (OS threads) | Heavy (process spawn + pickle) | Minimal (coroutines are cheap) |
| Complexity | Medium (race conditions) | Medium (serialisation) | Medium (async/await syntax) |
| Max concurrency | ~100–1 000 threads (OS limit) | ~CPU core count | Tens of thousands of coroutines |
threading — overlapping I/O waits
Use threading when your program makes many blocking I/O calls and you want them to overlap. While one thread is blocked waiting on a network response, another thread can run. The GIL doesn't block during I/O waits, so you get real concurrency for free.
import threading, urllib.request
def fetch(url):
with urllib.request.urlopen(url) as r:
return len(r.read()) # GIL released while waiting for network
urls = ["https://example.com"] * 10
threads = [threading.Thread(target=fetch, args=(u,)) for u in urls]
for t in threads: t.start()
for t in threads: t.join()
# All 10 requests overlap — ~10x faster than sequential
The catch: shared mutable state across threads creates race conditions. You need
threading.Lock around any object multiple threads write to.
Rule of thumb: threading works well for a modest number (dozens to low hundreds) of concurrent blocking calls where you want simple, familiar code.
Deep dive: Threading & the GIL interview questions
multiprocessing — true CPU parallelism
When work is CPU-bound — image processing, data transformation, machine learning
preprocessing, cryptography — you need multiple CPU cores running in true parallel.
multiprocessing spawns separate Python processes, each with its own GIL and memory
space, so they can use all available cores simultaneously.
from multiprocessing import Pool
def cpu_work(n):
return sum(i * i for i in range(n)) # pure CPU — needs real parallelism
with Pool(processes=4) as pool: # 4 worker processes
results = pool.map(cpu_work, [1_000_000] * 8)
# Runs on 4 cores in parallel — ~4x faster on a quad-core machine
The cost is process startup overhead and the requirement that all data crossing process boundaries must be picklable. Lambda functions, file handles, and database connections can't be pickled — they must stay inside the worker.
concurrent.futures.ProcessPoolExecutor provides the same power with a cleaner API:
from concurrent.futures import ProcessPoolExecutor
with ProcessPoolExecutor(max_workers=4) as ex:
results = list(ex.map(cpu_work, [1_000_000] * 8))
Rule of thumb: reach for multiprocessing (or ProcessPoolExecutor) when a single
CPU core is the bottleneck — profiling should show near-100% CPU usage on one core.
Deep dive: Multiprocessing interview questions
asyncio — thousands of concurrent I/O operations
asyncio is Python's framework for cooperative concurrency on a single thread. An
event loop runs many coroutines that voluntarily yield control with await when
they're waiting for I/O. Because there's no thread switching overhead and coroutines are
cheap objects, asyncio can manage tens of thousands of concurrent connections — ideal
for web servers, chat systems, and API gateway code.
import asyncio, aiohttp
async def fetch(session, url):
async with session.get(url) as r:
return await r.text() # yields to event loop while waiting
async def main():
async with aiohttp.ClientSession() as session:
tasks = [fetch(session, "https://example.com") for _ in range(1000)]
pages = await asyncio.gather(*tasks) # all 1000 requests overlap
print(len(pages))
asyncio.run(main())
The critical rule: never block the event loop. Any blocking call (a synchronous
requests.get, time.sleep, heavy CPU work) freezes all coroutines until it
completes. Use await asyncio.sleep, async libraries (aiohttp, asyncpg), and
loop.run_in_executor for blocking operations you can't avoid.
Rule of thumb: choose asyncio when you need massive I/O concurrency (hundreds
to thousands of simultaneous connections) and are willing to use async-compatible
libraries throughout.
Deep dive: asyncio & async/await interview questions
The decision flowchart
Is your bottleneck I/O (network, disk, database)?
├── Yes → How many concurrent operations?
│ ├── Dozens → threading (simple, familiar)
│ └── Hundreds / thousands → asyncio (scalable, low overhead)
└── No → Is it CPU computation?
└── Yes → multiprocessing / ProcessPoolExecutor (real parallelism)
Or as a single question: "What am I waiting on?"
- Waiting on the network / disk → threading or asyncio
- Waiting on the CPU → multiprocessing
When concurrent.futures simplifies the choice
concurrent.futures provides a unified API over both threading and multiprocessing:
from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor
# I/O-bound: swap ThreadPoolExecutor for ProcessPoolExecutor and back easily
with ThreadPoolExecutor(max_workers=10) as ex:
results = list(ex.map(fetch_url, urls))
with ProcessPoolExecutor(max_workers=4) as ex:
results = list(ex.map(cpu_work, data))
Use it when you want straightforward map/submit semantics without managing threads or
processes manually.
Deep dive: concurrent.futures interview questions
Recap
Threading overlaps blocking I/O by letting threads wait simultaneously — the GIL means no CPU-level parallelism, but I/O-heavy code still gets a real speedup. Multiprocessing spawns separate processes with their own GIL, giving true multi-core parallelism for CPU-bound work at the cost of higher overhead and pickling constraints. asyncio runs all coroutines on one thread in a single event loop, enabling tens of thousands of concurrent I/O operations with minimal overhead — but every line in the async path must yield control properly. The deciding factor is always: I/O bottleneck (threading or asyncio) vs CPU bottleneck (multiprocessing).