Thinking in streams, not loops
The Stream API (Java 8) lets you describe a computation over a sequence of data
declaratively — what you want, not the loop that produces it. The mental shift that
trips people up is this: a Stream is not a data structure. It holds no elements. It
is a pipeline that pulls elements from a source, runs them through a chain of
operations, and produces a result. Once you internalise that, the rest of the API — the
laziness, the single-use rule, the parallel caveats — stops feeling arbitrary and starts
feeling inevitable.
This guide walks the pipeline end to end: where streams come from, how lazy and eager operations divide the work, and the handful of operations you reach for every day.
Stream vs collection
A collection is about storage: it sits in memory and you iterate it externally
(you write the for loop, you control the cursor). A stream is about processing: it
stores nothing, pulls from its source on demand, and iterates internally (the stream
drives the loop, you just supply the behaviour).
// External iteration — you manage the loop
List<String> out = new ArrayList<>();
for (String s : names) {
if (s.length() > 4) out.add(s.toUpperCase());
}
// Internal iteration — the stream manages the loop
List<String> out2 = names.stream()
.filter(s -> s.length() > 4)
.map(String::toUpperCase)
.toList();
Two consequences fall out of "stores nothing": a stream is single-use (consumed once), and it never mutates its source. Use a collection to hold data and a stream to transform it.
Sources: where a stream begins
Every pipeline starts at a source. The most common is Collection.stream(), but streams
can be built from explicit values, arrays, files, character data, and even infinite
generators.
list.stream(); // any Collection
Stream.of("a", "b", "c"); // explicit values
Arrays.stream(new int[]{1, 2, 3}); // an array
IntStream.rangeClosed(1, 5); // 1,2,3,4,5
Stream.iterate(1, n -> n * 2); // INFINITE: 1,2,4,8,...
Stream.generate(Math::random); // INFINITE: repeated supplier
Files.lines(Path.of("data.txt")); // lazy lines, backed by an open file
The infinite sources (iterate, generate) are only usable because of laziness — you
must pair them with a short-circuiting op like limit(n) or they never terminate.
Files.lines holds an OS file handle, so wrap it in try-with-resources to close it.
The pipeline: intermediate vs terminal
Every pipeline has exactly one shape: zero or more intermediate operations followed by one terminal operation. The distinction is the single most important idea in the API.
| Intermediate | Terminal | |
|---|---|---|
| Returns | another Stream | a value / side-effect (not a stream) |
| Evaluation | lazy — records, does nothing | eager — triggers the whole run |
| Count per pipeline | any number | exactly one |
| Examples | filter, map, flatMap, sorted, distinct, limit, peek | collect, forEach, reduce, count, findFirst, anyMatch, toArray |
list.stream()
.filter(s -> s.length() > 3) // intermediate — lazy, returns a Stream
.map(String::toUpperCase) // intermediate — lazy, returns a Stream
.collect(Collectors.toList()); // terminal — NOW the pipeline runs
Build a pipeline without a terminal operation and nothing executes. The intermediate steps are merely a recorded recipe waiting for a terminal op to ask for results.
Laziness and short-circuiting
Because intermediate ops are lazy, the runtime can do two clever things. First, fusion: instead of materialising an intermediate collection after each stage, elements flow vertically — one element passes through all stages before the next one starts, in a single pass. Second, short-circuiting: the pipeline can stop the moment the answer is known, without touching every element. That is what makes infinite sources tractable.
int firstMultipleOf7 = Stream.iterate(1, n -> n + 1) // infinite
.filter(n -> n % 7 == 0)
.findFirst() // stops at 7
.orElseThrow();
// "filter 1, filter 2 ... filter 7" then DONE — never runs forever
Short-circuiting terminal ops are findFirst, findAny, anyMatch, allMatch,
noneMatch; the short-circuiting intermediate op is limit. allMatch quits at the
first element that fails, anyMatch at the first that passes — neither usually
scans the whole stream.
The everyday operations
A handful of intermediate ops cover the bulk of real work. filter decides whether an
element survives (a Predicate); map decides what it becomes (a Function, one-to-one,
may change type); flatMap maps each element to a stream and flattens the results
into one (one-to-many, merged).
sentences.stream()
.map(String::trim) // 1-to-1: transform each
.flatMap(s -> Arrays.stream(s.split(" "))) // 1-to-many: explode into words, flatten
.filter(w -> !w.isBlank()) // keep the non-empty ones
.distinct() // drop duplicates (equals/hashCode)
.sorted() // natural order
.skip(2) // discard the first two
.limit(10) // keep at most ten
.toList();
Two of these are stateful: sorted and distinct must see other elements before they
can emit. sorted in particular is a full barrier — it buffers the entire stream, so
it cannot short-circuit and hangs on an infinite source. Apply filter before the
stateful ops to shrink the work they have to do.
reduce: folding to a single value
When you need to collapse a stream into one result — a sum, a product, a concatenation —
reduce repeatedly applies a binary operator. It has three overloads, escalating in power.
// 1. accumulator only -> Optional (the stream might be empty)
Optional<Integer> sum = nums.stream().reduce((a, b) -> a + b);
// 2. identity + accumulator -> plain value (returns identity if empty)
int total = nums.stream().reduce(0, Integer::sum);
// 3. identity + accumulator + combiner -> parallel-safe / type-changing
int charCount = words.stream()
.reduce(0, (acc, w) -> acc + w.length(), Integer::sum);
The identity must be a genuine no-op for the operation (0 for sum, 1 for product,
"" for concatenation). The combiner merges partial results computed by parallel
sub-streams and is required whenever the result type differs from the element type.
collect: building a result (briefly)
The most flexible terminal op is collect, which uses a Collector to accumulate elements
into a container. For the common cases there are also direct shortcuts.
List<String> list = s.collect(Collectors.toList()); // classic, mutable-ish
List<String> imm = s.toList(); // Java 16+, UNMODIFIABLE
Set<String> set = s.collect(Collectors.toSet());
String[] arr = s.toArray(String[]::new); // typed array (not Object[])
toList() is the concise modern choice but returns an unmodifiable list — use
Collectors.toList() if you must mutate it afterwards. The richer recipes (groupingBy,
partitioningBy, joining, downstream collectors) belong to the Collectors topic and
get their own page.
Match and find operations
When you only need a boolean or a single element, short-circuiting terminal ops are the right tool — they stop as soon as the answer is decided.
boolean hasNeg = nums.stream().anyMatch(n -> n < 0); // stops at first negative
boolean allPos = nums.stream().allMatch(n -> n > 0); // stops at first non-positive
Optional<Integer> first = nums.stream().filter(n -> n > 10).findFirst(); // encounter order
Optional<Integer> any = nums.parallelStream().filter(n -> n > 10).findAny(); // any thread
Mind the empty-stream edge cases: on an empty stream allMatch and noneMatch return
true (vacuous truth) while anyMatch returns false. findFirst honours
encounter order; findAny may return any match, which lets a parallel stream skip the
ordering constraint and run faster.
Primitive streams
Stream<Integer> boxes every element, which is wasteful for heavy numeric work. The
specialized IntStream, LongStream, and DoubleStream carry raw primitives
and add numeric terminal ops the object stream lacks.
int sum = IntStream.rangeClosed(1, 100).sum(); // 5050, no boxing
double avg = IntStream.of(1, 2, 3).average().orElse(0); // OptionalDouble
IntSummaryStatistics st = people.stream()
.mapToInt(Person::age) // Stream<Person> -> IntStream
.summaryStatistics(); // count, sum, min, max, average in one pass
Cross between worlds with mapToInt/mapToLong/mapToDouble to enter a primitive stream,
and boxed() or mapToObj(...) to return to an object stream. Reach for primitive streams
whenever you do real arithmetic — they are faster and give you sum, average, and
summaryStatistics for free.
Single-use streams and parallel pitfalls
A stream is traversed once. After a terminal op runs, the stream is consumed; touching
it again throws IllegalStateException: stream has already been operated upon or closed. If
you need the data twice, re-create the stream from the source, or wrap it in a
Supplier<Stream<T>> that builds a fresh one on demand.
Parallelism is the other sharp edge. parallelStream() splits the source across the common
ForkJoinPool — but it only pays off for large, CPU-bound work over a cheaply-splittable
source (arrays, ArrayList). The deeper trap is shared mutable state in your lambdas.
// BROKEN: data race — multiple threads mutate one ArrayList
List<Integer> out = new ArrayList<>();
nums.parallelStream().forEach(out::add);
// CORRECT: let the framework do the accumulation safely
List<Integer> safe = nums.parallelStream()
.collect(Collectors.toList());
Keep stream lambdas pure (no side effects, no external mutation) and express the result
with collect/reduce rather than accumulating from forEach. Stay sequential by default
and only parallelize after measuring a genuine win.
Recap
A Stream is a lazy pipeline, not a data structure — it pulls elements from a
source through intermediate operations (lazy, return a Stream) to a single
terminal operation (eager, triggers the run). Laziness buys you fusion and
short-circuiting, which is what makes infinite sources usable. Master the everyday ops —
filter/map/flatMap/distinct/sorted/limit/skip, fold with reduce, finish with
collect or a match/find op — and prefer primitive streams for numeric work. Respect
the two hard rules: a stream is single-use, and parallel streams demand pure,
stateless lambdas to stay correct.