Collectors & Grouping Interview Questions & Answers
Java Collectors interview questions — collect and the Collectors factory, toList/toMap/toSet, groupingBy and partitioningBy, downstream collectors, joining, counting and summing, and writing a custom collector.
collect is a mutable reduction terminal operation: it folds the stream's
elements into a mutable result container (a List, Map, StringBuilder,
etc.) by repeatedly accumulating into it. It takes a Collector, and the
java.util.stream.Collectors class is a factory of ready-made ones.
List<String> upper = names.stream()
.map(String::toUpperCase)
.collect(Collectors.toList());
Unlike reduce, which combines immutable values, collect mutates a
container in place, which is far more efficient for building collections and
strings. Reach for the Collectors factory first — only write a custom
collector when nothing there fits.
A Collector<T, A, R> is defined by four functions (T = input element,
A = mutable accumulation type, R = final result):
| Component | Role |
|---|---|
| supplier | creates a new empty mutable container (A) |
| accumulator | folds one element into the container |
| combiner | merges two partial containers (used in parallel) |
| finisher | transforms the container A into the result R |
// toList conceptually: supplier=ArrayList::new,
// accumulator=List::add, combiner=addAll, finisher=identity
A fifth piece, characteristics, hints at optimizations. The combiner
is what makes a collector parallel-safe; the finisher is skipped entirely
when the container already is the result (IDENTITY_FINISH).
All three gather elements into a collection, differing in the container:
toList()— accumulates into aList(anArrayListin practice).toSet()— accumulates into aSet(aHashSet), dropping duplicates, with no order guarantee.toCollection(supplier)— accumulates into whatever collection you supply, when you need a specific type.
List<String> list = s.collect(Collectors.toList());
Set<String> set = s.collect(Collectors.toSet());
TreeSet<String> sorted =
s.collect(Collectors.toCollection(TreeSet::new)); // sorted, no dups
Use toCollection when the default type won't do — e.g. a TreeSet for
ordering or a LinkedList for insertion semantics.
Both produce a List, but the mutability and null handling differ:
collect(Collectors.toList()) |
stream.toList() (Java 16+) |
|
|---|---|---|
| Mutability | modifiable (ArrayList) |
unmodifiable |
Allows null |
yes | yes |
| Conciseness | verbose | one method |
List<Integer> a = nums.stream().collect(Collectors.toList());
a.add(99); // OK — mutable
List<Integer> b = nums.stream().toList();
b.add(99); // UnsupportedOperationException
Prefer the newer stream().toList() for read-only results; it's shorter and
its immutability prevents accidental mutation. Use Collectors.toList() only
when you genuinely need to modify the result afterward.
Java 10 added toUnmodifiableList, toUnmodifiableSet and
toUnmodifiableMap, which return collections that throw
UnsupportedOperationException on any mutation. They also reject null
elements (throwing NullPointerException).
List<String> ro = names.stream()
.filter(n -> n.length() > 3)
.collect(Collectors.toUnmodifiableList());
ro.clear(); // UnsupportedOperationException
These are the collector equivalents of List.of/Set.of/Map.of. Since
Java 16 you can also just use stream().toList() for an unmodifiable List.
toMap builds a Map from each element using a key mapper and a value
mapper — two functions that derive the key and value from each element.
Map<String, Integer> byName = people.stream()
.collect(Collectors.toMap(
Person::name, // key mapper
Person::age)); // value mapper
By default the result is a HashMap. The catch interviewers probe is duplicate
keys: if two elements map to the same key, the two-argument toMap throws an
IllegalStateException ("Duplicate key") — you must supply a merge function
to resolve collisions.
The three-argument toMap takes a merge function (existing, new) -> result invoked whenever two elements produce the same key. Without it,
duplicate keys throw IllegalStateException.
// two-arg: throws IllegalStateException on duplicate "Bob"
// three-arg: resolve the clash
Map<String, Integer> totals = orders.stream()
.collect(Collectors.toMap(
Order::customer,
Order::amount,
Integer::sum)); // merge: add amounts for same customer
A fourth argument supplies the map type (e.g. TreeMap::new) for ordering
or a specific implementation. Always provide a merge function when keys aren't
guaranteed unique — it's the single most common toMap bug.
groupingBy takes a classifier function and partitions elements into a
Map<K, List<T>> keyed by the classifier's result — every element with the
same key lands in the same list.
Map<Department, List<Employee>> byDept = employees.stream()
.collect(Collectors.groupingBy(Employee::department));
It's the SQL GROUP BY of streams. The default value container is a List and
the default map is a HashMap. This single-argument form is the gateway — the
real power comes from adding a downstream collector to reshape each group.
The two-argument groupingBy(classifier, downstream) applies a second
collector to each group instead of just collecting elements into a list. This
lets you count, sum, average, or otherwise reduce each bucket.
// count per department
Map<Dept, Long> counts = emps.stream()
.collect(Collectors.groupingBy(Employee::dept, Collectors.counting()));
// sum of salaries per department
Map<Dept, Integer> totals = emps.stream()
.collect(Collectors.groupingBy(Employee::dept,
Collectors.summingInt(Employee::salary)));
Common downstreams: counting, summingInt/Long/Double,
averagingInt/Double, mapping, toSet, joining, maxBy/minBy,
reducing. Downstream collectors are the heart of expressive aggregation.
Collectors.mapping(mapper, downstream) applies a transform to each element
before it reaches a further downstream collector — it adapts a collector to a
different input type. It's how you collect a field of each group member
rather than the whole object.
// names of employees in each department
Map<Dept, List<String>> names = emps.stream()
.collect(Collectors.groupingBy(Employee::dept,
Collectors.mapping(Employee::name, Collectors.toList())));
Think of mapping as a map() embedded inside a collector. It pairs naturally
with toList, toSet, or joining to project group members.
Combine groupingBy with the counting() downstream collector. counting
returns a Long, so the result is Map<K, Long>.
Map<String, Long> wordFreq = words.stream()
.collect(Collectors.groupingBy(
Function.identity(), // group by the word itself
Collectors.counting())); // count occurrences
// {"the"=4, "cat"=2, ...}
Function.identity() is the idiom for grouping elements by themselves — a
frequency map. This is the canonical "count occurrences" stream pattern.
Because the downstream of groupingBy can be another groupingBy, you
nest them to build a multi-level map — exactly like a SQL GROUP BY on two
columns.
// group by department, then by city within each department
Map<Dept, Map<String, List<Employee>>> nested = emps.stream()
.collect(Collectors.groupingBy(Employee::dept,
Collectors.groupingBy(Employee::city)));
You can keep nesting or end with an aggregating downstream
(Map<Dept, Map<String, Long>> via counting()). The outer classifier forms
the first key level; each inner collector handles the next.
partitioningBy splits a stream into exactly two groups using a
predicate, returning a Map<Boolean, List<T>> with keys true and
false. It's a specialized, optimized groupingBy for the boolean case.
Map<Boolean, List<Integer>> parts = nums.stream()
.collect(Collectors.partitioningBy(n -> n % 2 == 0));
parts.get(true); // evens
parts.get(false); // odds
Key difference from groupingBy: the map always contains both keys, even
when one partition is empty (groupingBy omits empty groups). It also accepts a
downstream collector: partitioningBy(pred, counting()).
joining concatenates the stream's CharSequence elements into one
String. It has three forms: no-arg (concatenate), one-arg (delimiter), and
three-arg (delimiter, prefix, suffix).
String csv = names.stream()
.collect(Collectors.joining(", ")); // "Ann, Bob, Cy"
String list = names.stream()
.collect(Collectors.joining(", ", "[", "]")); // "[Ann, Bob, Cy]"
It only accepts CharSequence, so map(Object::toString) first if your
elements aren't strings. Internally it uses a StringBuilder, making it far
more efficient than reducing with +.
summingInt/Long/Double apply a value-extracting function and sum the
results; averagingInt/Long/Double compute the mean. Each takes a
ToIntFunction-style mapper.
int total = orders.stream()
.collect(Collectors.summingInt(Order::quantity)); // sum -> int/long
double avg = orders.stream()
.collect(Collectors.averagingInt(Order::quantity)); // mean -> double
Note summingInt returns the primitive's boxed total, while all
averaging* variants return Double (a mean is rarely integral). These shine
as groupingBy downstreams for per-group totals and means.
summarizingInt/Long/Double compute count, sum, min, max, and average in a
single pass, returning an IntSummaryStatistics (or Long/Double variant)
that exposes all five.
IntSummaryStatistics stats = employees.stream()
.collect(Collectors.summarizingInt(Employee::salary));
stats.getCount(); // 50
stats.getSum(); // 3_200_000
stats.getMin(); // 40_000
stats.getMax(); // 180_000
stats.getAverage(); // 64000.0
Use it instead of running several collectors when you need multiple statistics —
it traverses the stream once. (stream().mapToInt(...).summaryStatistics() is
the equivalent without collect.)
Collectors.reducing is a collector-form of reduction. Standalone it
duplicates Stream.reduce, so its real purpose is being a downstream
collector inside groupingBy/partitioningBy, where you can't drop to
Stream.reduce.
// highest-paid employee per department
Map<Dept, Optional<Employee>> top = emps.stream()
.collect(Collectors.groupingBy(Employee::dept,
Collectors.reducing(BinaryOperator.maxBy(
Comparator.comparingInt(Employee::salary)))));
For top-level reductions prefer Stream.reduce — it's clearer. Reach for
Collectors.reducing (or the dedicated maxBy/minBy) only as a downstream.
collectingAndThen(downstream, finisher) runs a collector, then applies a
finishing transformation to its result. It's how you adapt a collector's
output — most commonly to wrap a collection as unmodifiable or to extract a
value from an Optional.
List<String> immutable = names.stream()
.collect(Collectors.collectingAndThen(
Collectors.toList(),
Collections::unmodifiableList));
// unwrap the maxBy Optional per group
Map<Dept, Employee> top = emps.stream()
.collect(Collectors.groupingBy(Employee::dept,
Collectors.collectingAndThen(
Collectors.maxBy(Comparator.comparingInt(Employee::salary)),
Optional::get)));
It effectively bolts a custom finisher onto an existing collector without writing one from scratch.
Added in Java 9, both are downstream collectors that solve a real problem:
filtering before the stream's filter() would silently drop empty groups.
filtering(predicate, downstream)— keeps only matching elements per group, but preserves the group key even if it becomes empty (unlike a pre-filter, which removes the whole bucket).flatMapping(mapper, downstream)— flattens each element to a stream and collects the results, the collector-levelflatMap.
Map<Dept, List<Employee>> highEarners = emps.stream()
.collect(Collectors.groupingBy(Employee::dept,
Collectors.filtering(e -> e.salary() > 100_000,
Collectors.toList())));
Use filtering over an upstream filter whenever you need every group key
present, even with no surviving members.
teeing (Java 12) feeds each element to two downstream collectors at once,
then merges their two results with a BiFunction. It computes two
aggregates in a single pass.
// average = sum / count, in one traversal
double avg = nums.stream()
.collect(Collectors.teeing(
Collectors.summingDouble(n -> n), // result 1: sum
Collectors.counting(), // result 2: count
(sum, count) -> sum / count)); // merge
It's ideal when two statistics depend on the same stream and you want to avoid collecting to a list or streaming twice — e.g. min and max, or sum and count.
Characteristics are optimization hints in a collector's characteristics()
set that tell the stream pipeline what shortcuts are safe:
UNORDERED— the result doesn't depend on encounter order (e.g.toSet), so the pipeline may reorder for speed.CONCURRENT— the accumulator can be called on one shared container from multiple threads (e.g.groupingByConcurrent), avoiding merges.IDENTITY_FINISH— the finisher is the identity, so the container is the result and the finisher step is skipped.
Collectors.toList(); // IDENTITY_FINISH
Collectors.toSet(); // UNORDERED, IDENTITY_FINISH
You rarely set these directly — they matter mostly when writing a custom collector or reasoning about parallel-stream performance.
Use Collector.of(supplier, accumulator, combiner, [finisher], characteristics...),
passing the four functions directly. The combiner is mandatory so the collector
works in parallel.
// a custom collector that joins names into a single uppercase CSV string
Collector<String, StringJoiner, String> upperCsv = Collector.of(
() -> new StringJoiner(", "), // supplier
(j, s) -> j.add(s.toUpperCase()), // accumulator
StringJoiner::merge, // combiner
StringJoiner::toString); // finisher
String result = names.stream().collect(upperCsv);
Omit the finisher when the container already is the result (an
IDENTITY_FINISH collector). In practice, prefer composing existing
Collectors (mapping, collectingAndThen, teeing) — write a fully custom
collector only when no combination fits.
More Streams & Functional interview questions
More ways to practice
The self-quiz is live. Get notified when mock interviews and new question packs drop.