Multiprocessing Interview Questions & Answers

6 questions Updated 2026-06-18

Python interview questions on the multiprocessing module: how it sidesteps the GIL, Process vs Pool, IPC with Queue/Pipe, pickling limits, and shared state.

Threading runs multiple threads inside one process that share one interpreter — and therefore one GIL (Global Interpreter Lock), which lets only one thread execute Python bytecode at a time. Multiprocessing spawns separate OS processes, each with its own interpreter and own GIL, so they can run Python code in true parallel on multiple CPU cores.

from multiprocessing import Process
import os

def work():
    print(f"running in pid {os.getpid()}")  # a distinct process each time

if __name__ == "__main__":            # required guard on Windows/spawn
    ps = [Process(target=work) for _ in range(4)]
    for p in ps: p.start()
    for p in ps: p.join()

The tradeoff: processes don't share memory, so passing data costs serialization (pickling) and IPC overhead, and each process has higher startup cost than a thread. Rule of thumb: reach for multiprocessing when you need real CPU parallelism, not just concurrency.

A Process represents a single child process you start and join manually — good when you have a fixed, small number of distinct tasks. A Pool manages a reusable group of worker processes and hands out work to them, which is far more convenient for many homogeneous tasks over a dataset.

from multiprocessing import Pool

def square(n):
    return n * n

if __name__ == "__main__":
    with Pool(processes=4) as pool:
        results = pool.map(square, range(10))   # distributed across 4 workers
    print(results)   # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Pool reuses workers (amortizing startup cost) and offers map, imap, apply_async, etc. Use Process for a few long-lived distinct jobs; use Pool when you're fanning the same function over many inputs.

Because processes don't share memory, they communicate through IPC primitives. A Queue is a multi-producer/multi-consumer, thread- and process-safe FIFO — the general-purpose choice. A Pipe is a faster but lower-level two-endpoint connection, best for communication between exactly two processes.

from multiprocessing import Process, Queue

def producer(q):
    q.put("result")          # values are pickled across the boundary

if __name__ == "__main__":
    q = Queue()
    p = Process(target=producer, args=(q,))
    p.start()
    print(q.get())           # "result"
    p.join()

Both serialize objects under the hood, so only picklable data flows through them. Use Queue for fan-in/fan-out among many workers; use Pipe for a tight one-to-one channel where you want the lower overhead.

Every argument and return value crossing a process boundary must be serialized with pickle, sent, then deserialized on the other side. For large objects this copying cost can dwarf the parallelism gains, and some objects simply can't be pickled.

import pickle

pickle.dumps(lambda x: x)   # PicklingError — lambdas aren't picklable
# Also unpicklable: open file handles, sockets, locks, db connections,
# local/nested functions, and generators.

Picklable things include module-level functions and classes, and basic containers of picklable values. The practical implications: pass small, picklable payloads, define worker functions at module top level, and avoid shipping huge data structures between processes. Minimizing what crosses the boundary is the key to multiprocessing performance.

Since each process has its own memory, you need explicit shared-state tools. Value and Array put simple data in shared memory (fast, but limited types and you must guard with a lock). A Manager hosts richer shared objects (dict, list, etc.) via a server process — more flexible but slower because access is proxied.

from multiprocessing import Process, Value, Lock

def inc(counter, lock):
    for _ in range(1000):
        with lock:               # protect the shared value
            counter.value += 1

if __name__ == "__main__":
    counter = Value("i", 0)      # shared int in shared memory
    lock = Lock()
    ps = [Process(target=inc, args=(counter, lock)) for _ in range(4)]
    for p in ps: p.start()
    for p in ps: p.join()
    print(counter.value)         # 4000

Prefer message passing (Queue/Pipe) over shared state when you can — it's easier to reason about. Reach for Value/Array for hot, simple counters and a Manager only when you genuinely need shared complex objects.

Use multiprocessing for CPU-bound work — number crunching, image processing, data transforms — where you need to saturate multiple cores with Python code. Threads and asyncio can't do that because the GIL serializes bytecode execution; only separate processes get separate GILs.

from multiprocessing import Pool

def heavy(n):                      # pure-Python CPU work
    return sum(i * i for i in range(n))

if __name__ == "__main__":
    with Pool() as pool:           # defaults to os.cpu_count() workers
        print(pool.map(heavy, [10_000_000] * 8))   # runs in parallel

For I/O-bound work (network, disk, DB), threads or asyncio are usually better — they're cheaper and the GIL is released during I/O anyway. Rule of thumb: CPU-bound -> multiprocessing; I/O-bound -> threads/asyncio.

Practice tests are coming soon

Get notified when interactive mock interviews and quizzes launch.