Python threading vs multiprocessing vs asyncio

One problem, three different solutions

Python has three concurrency models — threading, multiprocessing, and asyncio — and which one you pick determines whether your program gets faster or just gets more complicated. The choice is not arbitrary: each model solves a different bottleneck. Get it wrong and you'll add overhead without gaining any throughput. The key to choosing correctly is understanding two concepts: the GIL and the I/O-bound vs CPU-bound distinction.

The GIL — why threading isn't always parallel

CPython's Global Interpreter Lock (GIL) is a mutex that allows only one thread to execute Python bytecode at a time. On a four-core machine, a multithreaded Python program still runs its bytecode on one core at a time. Threads take turns; they don't run simultaneously for CPU work.

The GIL is released around blocking I/O (network calls, file reads, sleep) — so multiple threads can overlap those waits. But for pure Python number-crunching, threads can't use more than one core.

import threading

def burn_cpu():
    total = 0
    for _ in range(10_000_000):
        total += 1   # holds the GIL the whole time

t1 = threading.Thread(target=burn_cpu)
t2 = threading.Thread(target=burn_cpu)
t1.start(); t2.start()
t1.join();  t2.join()
# Takes ~same time as running sequentially — GIL serialises bytecode

This is why the first question to answer before picking a model is: what is the bottleneck?

Quick-reference comparison

	`threading`	`multiprocessing`	`asyncio`
Parallelism	No (GIL)	Yes — one process per CPU core	No — cooperative, single-threaded
Best for	I/O-bound (many slow calls)	CPU-bound (number crunching, image processing)	I/O-bound (many concurrent connections)
Memory model	Shared	Separate per process	Shared (single thread)
Communication	Easy (shared objects, locks)	Explicit (Queue, Pipe, Value) — must pickle	`await`, queues, `asyncio.gather`
Overhead	Light (OS threads)	Heavy (process spawn + pickle)	Minimal (coroutines are cheap)
Complexity	Medium (race conditions)	Medium (serialisation)	Medium (async/await syntax)
Max concurrency	~100–1 000 threads (OS limit)	~CPU core count	Tens of thousands of coroutines

`threading` — overlapping I/O waits

Use threading when your program makes many blocking I/O calls and you want them to overlap. While one thread is blocked waiting on a network response, another thread can run. The GIL doesn't block during I/O waits, so you get real concurrency for free.

import threading, urllib.request

def fetch(url):
    with urllib.request.urlopen(url) as r:
        return len(r.read())     # GIL released while waiting for network

urls = ["https://example.com"] * 10
threads = [threading.Thread(target=fetch, args=(u,)) for u in urls]
for t in threads: t.start()
for t in threads: t.join()
# All 10 requests overlap — ~10x faster than sequential

The catch: shared mutable state across threads creates race conditions. You need threading.Lock around any object multiple threads write to.

Rule of thumb: threading works well for a modest number (dozens to low hundreds) of concurrent blocking calls where you want simple, familiar code.

Deep dive: Threading & the GIL interview questions

`multiprocessing` — true CPU parallelism

When work is CPU-bound — image processing, data transformation, machine learning preprocessing, cryptography — you need multiple CPU cores running in true parallel. multiprocessing spawns separate Python processes, each with its own GIL and memory space, so they can use all available cores simultaneously.

from multiprocessing import Pool

def cpu_work(n):
    return sum(i * i for i in range(n))   # pure CPU — needs real parallelism

with Pool(processes=4) as pool:           # 4 worker processes
    results = pool.map(cpu_work, [1_000_000] * 8)
# Runs on 4 cores in parallel — ~4x faster on a quad-core machine

The cost is process startup overhead and the requirement that all data crossing process boundaries must be picklable. Lambda functions, file handles, and database connections can't be pickled — they must stay inside the worker.

concurrent.futures.ProcessPoolExecutor provides the same power with a cleaner API:

from concurrent.futures import ProcessPoolExecutor

with ProcessPoolExecutor(max_workers=4) as ex:
    results = list(ex.map(cpu_work, [1_000_000] * 8))

Rule of thumb: reach for multiprocessing (or ProcessPoolExecutor) when a single CPU core is the bottleneck — profiling should show near-100% CPU usage on one core.

Deep dive: Multiprocessing interview questions

`asyncio` — thousands of concurrent I/O operations

asyncio is Python's framework for cooperative concurrency on a single thread. An event loop runs many coroutines that voluntarily yield control with await when they're waiting for I/O. Because there's no thread switching overhead and coroutines are cheap objects, asyncio can manage tens of thousands of concurrent connections — ideal for web servers, chat systems, and API gateway code.

import asyncio, aiohttp

async def fetch(session, url):
    async with session.get(url) as r:
        return await r.text()     # yields to event loop while waiting

async def main():
    async with aiohttp.ClientSession() as session:
        tasks = [fetch(session, "https://example.com") for _ in range(1000)]
        pages = await asyncio.gather(*tasks)   # all 1000 requests overlap
        print(len(pages))

asyncio.run(main())

The critical rule: never block the event loop. Any blocking call (a synchronous requests.get, time.sleep, heavy CPU work) freezes all coroutines until it completes. Use await asyncio.sleep, async libraries (aiohttp, asyncpg), and loop.run_in_executor for blocking operations you can't avoid.

Rule of thumb: choose asyncio when you need massive I/O concurrency (hundreds to thousands of simultaneous connections) and are willing to use async-compatible libraries throughout.

Deep dive: asyncio & async/await interview questions

The decision flowchart

Is your bottleneck I/O (network, disk, database)?
├── Yes → How many concurrent operations?
│         ├── Dozens → threading  (simple, familiar)
│         └── Hundreds / thousands → asyncio  (scalable, low overhead)
└── No → Is it CPU computation?
          └── Yes → multiprocessing / ProcessPoolExecutor  (real parallelism)

Or as a single question: "What am I waiting on?"

Waiting on the network / disk → threading or asyncio
Waiting on the CPU → multiprocessing

When `concurrent.futures` simplifies the choice

concurrent.futures provides a unified API over both threading and multiprocessing:

from concurrent.futures import ThreadPoolExecutor, ProcessPoolExecutor

# I/O-bound: swap ThreadPoolExecutor for ProcessPoolExecutor and back easily
with ThreadPoolExecutor(max_workers=10) as ex:
    results = list(ex.map(fetch_url, urls))

with ProcessPoolExecutor(max_workers=4) as ex:
    results = list(ex.map(cpu_work, data))

Use it when you want straightforward map/submit semantics without managing threads or processes manually.

Deep dive: concurrent.futures interview questions

Recap

Threading overlaps blocking I/O by letting threads wait simultaneously — the GIL means no CPU-level parallelism, but I/O-heavy code still gets a real speedup. Multiprocessing spawns separate processes with their own GIL, giving true multi-core parallelism for CPU-bound work at the cost of higher overhead and pickling constraints. asyncio runs all coroutines on one thread in a single event loop, enabling tens of thousands of concurrent I/O operations with minimal overhead — but every line in the async path must yield control properly. The deciding factor is always: I/O bottleneck (threading or asyncio) vs CPU bottleneck (multiprocessing).

Python threading vs multiprocessing vs asyncio — Which to Use When