Creating a thread is expensive — each one needs a stack (often ~512KB–1MB), a kernel scheduling entry, and OS bookkeeping. Spawning one per task means unbounded thread creation under load, which exhausts memory and thrashes the scheduler. A thread pool reuses a fixed set of worker threads to run many tasks, so you pay the creation cost once and bound resource usage.
// anti-pattern: a new OS thread per request — no limit, no reuse
for (Runnable task : tasks) new Thread(task).start();
// pooled: a bounded set of workers pulls tasks off a queue
ExecutorService pool = Executors.newFixedThreadPool(8);
for (Runnable task : tasks) pool.submit(task);
The two wins interviewers want: reuse (amortize thread cost) and bounding (a cap on concurrency so a spike can't take down the JVM). The pool also decouples task submission from task execution.
They form a layered interface stack, each adding capability:
Executor— the minimal contract: a singleexecute(Runnable)method. It just runs a task; how (now, pooled, async) is the implementation's choice.ExecutorService— extendsExecutorwith lifecycle (shutdown,awaitTermination) and result-bearing submission (submit,invokeAll,invokeAny) that returnFutures.ScheduledExecutorService— extendsExecutorServiceto run tasks after a delay or periodically (schedule,scheduleAtFixedRate).
Executor e = Runnable::run; // simplest possible Executor
ExecutorService es = Executors.newFixedThreadPool(4);
ScheduledExecutorService ses = Executors.newScheduledThreadPool(2);
You almost always program against ExecutorService — it gives you both task
results and a way to shut the pool down cleanly.
Executors is a factory of preconfigured ExecutorServices:
| Method | Behavior |
|---|---|
newFixedThreadPool(n) |
n fixed threads, unbounded task queue |
newCachedThreadPool() |
grows on demand, idle threads die after 60s, no queue (synchronous handoff) |
newSingleThreadExecutor() |
one thread, tasks run sequentially, unbounded queue |
newScheduledThreadPool(n) |
n threads for delayed/periodic tasks |
newWorkStealingPool() |
a ForkJoinPool sized to CPUs, work-stealing |
newVirtualThreadPerTaskExecutor() |
a new virtual thread per task (Java 21+) |
ExecutorService fixed = Executors.newFixedThreadPool(8);
ExecutorService cached = Executors.newCachedThreadPool();
Each is just a ThreadPoolExecutor (or ForkJoinPool) with specific defaults —
knowing those defaults explains the pitfalls in the next question.
The convenience factories hide dangerous defaults that can cause
OutOfMemoryError under load:
newFixedThreadPool/newSingleThreadExecutoruse an unboundedLinkedBlockingQueue— if tasks arrive faster than they finish, the queue grows without limit until the heap is exhausted.newCachedThreadPoolhas no upper bound on threads — a burst can spawn thousands of threads and run the machine out of native memory.
// explicit, safe: bounded queue + bounded threads + an explicit reject policy
ExecutorService pool = new ThreadPoolExecutor(
8, 16, 60L, TimeUnit.SECONDS,
new ArrayBlockingQueue<>(1000), // bounded queue
new ThreadPoolExecutor.CallerRunsPolicy() // backpressure on overflow
);
Effective Java's guidance is to construct ThreadPoolExecutor directly so
every parameter (queue bound, max threads, rejection behavior) is a deliberate
decision rather than a hidden default.
ThreadPoolExecutor has six knobs that fully define its behavior:
| Parameter | Meaning |
|---|---|
corePoolSize |
threads kept alive even when idle |
maximumPoolSize |
hard cap on total threads |
keepAliveTime |
how long non-core idle threads survive before dying |
workQueue |
the BlockingQueue that holds waiting tasks |
threadFactory |
creates worker threads (name them, set daemon/priority) |
handler |
RejectedExecutionHandler for when the pool is saturated |
new ThreadPoolExecutor(
4, // corePoolSize
10, // maximumPoolSize
30L, TimeUnit.SECONDS, // keepAliveTime
new ArrayBlockingQueue<>(100), // workQueue
new CustomThreadFactory("worker"), // threadFactory
new ThreadPoolExecutor.AbortPolicy()// handler
);
Always supply a named ThreadFactory in production — default thread names
like pool-1-thread-3 make stack traces and thread dumps nearly useless.
ThreadPoolExecutor applies a strict, sometimes surprising, order on execute:
- If running threads <
corePoolSize, start a new core thread for the task (even if other threads are idle). - Else, try to enqueue the task in the
workQueue. - Only if the queue is full does it create threads up to
maximumPoolSize. - If the queue is full and threads are at max, the task is rejected.
// core=2, queue=2, max=4 -> capacity surfaces in this order:
// tasks 1-2 -> run on 2 core threads
// tasks 3-4 -> wait in the queue (size 2)
// tasks 5-6 -> spawn 2 extra threads (up to max=4)
// task 7 -> rejected
The non-obvious consequence: with an unbounded queue, step 3 never triggers,
so maximumPoolSize is ignored and the pool never grows past core. This is
exactly why newFixedThreadPool only ever runs n threads.
A task is rejected when the queue is full and the pool is at
maximumPoolSize (or after shutdown). The RejectedExecutionHandler decides
what happens:
| Policy | Behavior |
|---|---|
AbortPolicy (default) |
throws RejectedExecutionException |
CallerRunsPolicy |
runs the task on the submitting thread (natural backpressure) |
DiscardPolicy |
silently drops the task |
DiscardOldestPolicy |
drops the oldest queued task, retries the new one |
var pool = new ThreadPoolExecutor(2, 4, 60, TimeUnit.SECONDS,
new ArrayBlockingQueue<>(10),
new ThreadPoolExecutor.CallerRunsPolicy());
CallerRunsPolicy is the favorite for throughput-with-stability: when the pool
is overwhelmed, the producer is slowed down (it executes the task itself)
instead of either throwing or losing work. You can also implement your own
handler to log, meter, or persist rejected tasks.
The right size depends on what threads spend their time doing:
- CPU-bound work keeps the core busy, so more threads than cores just adds
context-switching overhead. Size ≈ number of cores (sometimes
cores + 1to cover the occasional page fault). - IO-bound work blocks on network/disk, leaving the CPU idle, so you want many more threads than cores to keep the CPU saturated.
A useful formula (Brian Goetz): threads = cores × (1 + waitTime / computeTime).
int cores = Runtime.getRuntime().availableProcessors();
ExecutorService cpuPool = Executors.newFixedThreadPool(cores); // CPU-bound
ExecutorService ioPool = Executors.newFixedThreadPool(cores * 8); // IO-bound (illustrative)
Treat the formula as a starting point — measure throughput and latency under realistic load and tune. (For heavily IO-bound work on Java 21+, virtual threads sidestep sizing entirely.)
execute(Runnable) comes from Executor and returns void — it's
fire-and-forget. submit(...) comes from ExecutorService, accepts a
Runnable or Callable, and returns a Future you can use to get the
result, wait for completion, or cancel.
pool.execute(() -> log("done")); // void, no handle
Future<Integer> f = pool.submit(() -> 42); // Future handle
Integer result = f.get(); // 42
A subtle but important difference for debugging: with execute, an uncaught
exception propagates to the thread's UncaughtExceptionHandler (you see it).
With submit, the exception is captured inside the Future and only
surfaces when you call get() — so a submitted task that throws can fail
silently if you never inspect the Future.
Both represent a unit of work, but:
Runnable—void run(), cannot return a value, and cannot throw checked exceptions.Callable<V>—V call() throws Exception, returns a result and may throw checked exceptions.
Runnable r = () -> System.out.println("side effect only");
Callable<Integer> c = () -> {
if (bad) throw new IOException("checked exception is fine here");
return compute(); // returns a value
};
Future<Integer> f = pool.submit(c);
Use Callable when the task produces a result or might throw a checked
exception. You can adapt a Runnable to a Callable via
Executors.callable(runnable).
A Future<V> is a handle to a result that may not exist yet — the receipt
you get back from submit. Its key methods:
get()— blocks until the task completes and returns the result (or the timedget(timeout, unit)variant).isDone()— non-blocking check for completion.cancel(mayInterruptIfRunning)— attempts to cancel;isCancelled()reports it.
Future<Integer> f = pool.submit(() -> slowComputation());
if (!f.isDone()) doOtherWork();
try {
Integer v = f.get(2, TimeUnit.SECONDS); // wait up to 2s
} catch (TimeoutException te) {
f.cancel(true); // give up, interrupt the task
}
The classic limitation: a plain Future has no callbacks — you can only
poll isDone() or block on get(). That gap is what CompletableFuture
fills.
If a task throws, the exception is stored in the Future and re-thrown,
wrapped in an ExecutionException, when you call get(). The original cause
is available via getCause().
Future<Integer> f = pool.submit(() -> { throw new IllegalStateException("boom"); });
try {
f.get();
} catch (ExecutionException e) {
Throwable cause = e.getCause(); // the original IllegalStateException
} catch (InterruptedException e) {
Thread.currentThread().interrupt(); // restore the interrupt flag
}
Two interview points: get() declares both ExecutionException (task threw)
and InterruptedException (the waiting thread was interrupted), and you must
unwrap getCause() to see what actually went wrong. A task whose result you
never get() can swallow its failure entirely.
Both submit a collection of Callables at once and block:
invokeAllruns all tasks and returns aList<Future>once every task has finished (eachFutureis already complete).invokeAnyreturns the result of the first task to complete successfully, then cancels the rest — great for racing redundant lookups.
List<Callable<Integer>> tasks = List.of(() -> 1, () -> 2, () -> 3);
List<Future<Integer>> all = pool.invokeAll(tasks); // waits for all 3
for (Future<Integer> f : all) use(f.get());
Integer fastest = pool.invokeAny(tasks); // first success, others cancelled
Both have timeout overloads. With invokeAll, any task that fails surfaces
only when you call get() on its Future; invokeAny throws
ExecutionException only if every task fails.
A plain Future can only be polled or blocked on — you can't attach a
continuation or compose multiple async results without blocking a thread.
CompletableFuture (Java 8+) implements CompletionStage, adding a fluent,
non-blocking, callback-driven pipeline.
CompletableFuture
.supplyAsync(() -> fetchUser(id)) // run async, returns a stage
.thenApply(User::name) // transform the result
.thenAccept(System.out::println) // consume it, no blocking
.exceptionally(ex -> { log(ex); return null; }); // handle failure
You can chain transformations, combine independent futures, and handle
errors declaratively — turning callback spaghetti into a readable flow.
supplyAsync/runAsync use the common ForkJoinPool by default; pass your own
Executor for control over which pool runs the work.
They cover the three common composition shapes:
thenApply(fn)— transform the result with a plain function (T -> U). Result isCompletableFuture<U>.thenCompose(fn)— chain a function that itself returns a future (T -> CompletableFuture<U>); it flattens, avoiding a nestedCompletableFuture<CompletableFuture<U>>. This is the async "flatMap".thenCombine(other, bifn)— wait for two independent futures and merge their results.
cf.thenApply(x -> x + 1); // sync transform
cf.thenCompose(id -> fetchAsync(id)); // chain dependent async call
cf1.thenCombine(cf2, (a, b) -> a + b); // join two parallel results
Rule of thumb: use thenApply for a sync mapping, thenCompose when the
mapping is itself async, and thenCombine to fan two parallel results back
together.
Three methods catch failures that propagate down the chain:
exceptionally(fn)— runs only on failure, supplying a fallback value (Throwable -> T).handle(bifn)— runs on both success and failure ((T result, Throwable ex) -> U), so you can recover or transform either way.whenComplete(bifn)— a side-effect callback on completion that does not alter the result (good for logging/cleanup).
cf.thenApply(this::risky)
.exceptionally(ex -> DEFAULT) // fallback on error only
.handle((res, ex) -> ex != null ? -1 : res) // see both outcomes
.whenComplete((res, ex) -> log(res, ex)); // observe, don't change
Note that exceptions arrive wrapped in CompletionException (use
getCause()), and exceptionally recovers the chain so later stages see the
fallback value rather than the error.
They aggregate multiple futures:
allOf(cf...)— completes when all given futures complete. It returnsCompletableFuture<Void>, so you join, then read each future's result individually.anyOf(cf...)— completes when the first of them completes, carrying that future's result (asObject).
CompletableFuture<String> a = supplyAsync(() -> callA());
CompletableFuture<String> b = supplyAsync(() -> callB());
CompletableFuture.allOf(a, b).join(); // wait for both
String combined = a.join() + b.join(); // now safe, both done
Object first = CompletableFuture.anyOf(a, b).join(); // first to finish
Because allOf yields Void, the idiom is to join() it as a barrier and then
pull each individual result — frequently done by collecting a list of futures
and allOf-ing the array.
An ExecutorService keeps its threads alive (often non-daemon) until you stop
it, so you must shut it down or the JVM may never exit:
shutdown()— graceful: stops accepting new tasks but lets already submitted tasks finish. Returns immediately.shutdownNow()— aggressive: interrupts running tasks, drains the queue, and returns the list of tasks that never started.awaitTermination(timeout, unit)— blocks until the pool has terminated or the timeout elapses; returnstrueif it terminated.
pool.shutdown();
if (!pool.awaitTermination(30, TimeUnit.SECONDS)) {
pool.shutdownNow(); // force the stragglers
}
Neither shutdown method blocks on its own — awaitTermination is how you
actually wait. And shutdownNow only requests interruption; tasks that
ignore the interrupt flag keep running.
The canonical two-phase pattern (straight from the ExecutorService Javadoc):
ask politely, wait, then force, and restore the interrupt if you're
interrupted while waiting.
void shutdownAndAwait(ExecutorService pool) {
pool.shutdown(); // stop taking new tasks
try {
if (!pool.awaitTermination(60, TimeUnit.SECONDS)) {
pool.shutdownNow(); // cancel in-flight tasks
if (!pool.awaitTermination(60, TimeUnit.SECONDS))
log("pool did not terminate");
}
} catch (InterruptedException ie) {
pool.shutdownNow();
Thread.currentThread().interrupt(); // preserve interrupt status
}
}
Wire this into a JVM shutdown hook or your framework's lifecycle. The key habits:
two await phases, escalate from shutdown to shutdownNow, and never swallow
InterruptedException — re-set the flag.
It runs tasks after a delay or repeatedly, replacing the legacy Timer/
TimerTask (which used a single thread and let one failing task kill all others):
schedule(task, delay, unit)— run once after a delay.scheduleAtFixedRate(task, initial, period, unit)— start each run everyperiodfrom the start of the previous run (fixed cadence).scheduleWithFixedDelay(task, initial, delay, unit)— waitdelayafter each run finishes before the next starts.
var ses = Executors.newScheduledThreadPool(2);
ses.scheduleAtFixedRate(this::poll, 0, 5, TimeUnit.SECONDS); // every 5s
ses.scheduleWithFixedDelay(this::cleanup, 1, 10, TimeUnit.SECONDS);
The crucial difference: fixed-rate can let runs bunch up (or overlap conceptually) if a task runs longer than the period, while fixed-delay guarantees a gap between runs. Also note an uncaught exception silently cancels all future executions of that scheduled task.
Virtual threads (Java 21, JEP 444) are lightweight threads managed by the JVM, not 1:1 with OS threads. They're so cheap (a few hundred bytes) that you can have millions, and a blocking call unmounts the virtual thread from its carrier OS thread instead of blocking it.
// one virtual thread per task — no pooling, no sizing
try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
for (var task : tasks) executor.submit(task);
} // try-with-resources auto-closes (shutdown + await)
This upends the classic advice: for IO-bound work you no longer pool
virtual threads or agonize over pool sizing — you just create one per task.
Caveats: don't pool them, avoid pinning (long synchronized blocks or native
calls keep the carrier blocked), and platform-thread pools still matter for
CPU-bound work, where bounding parallelism to the core count is still right.
More Concurrency interview questions
More ways to practice
The self-quiz is live. Get notified when mock interviews and new question packs drop.