A Set is a Collection that holds no duplicate elements — adding an
element already present is a no-op and add returns false. There's no index
and (for the general contract) no positional access; you check membership with
contains, which is the operation sets are built to make fast.
Set<String> tags = new HashSet<>();
tags.add("java"); // true — added
tags.add("java"); // false — duplicate, ignored
tags.size(); // 1
tags.contains("java"); // true
"Duplicate" is defined by equals (and hashCode for hash-based sets, or
compareTo/Comparator for sorted sets) — not by reference identity. That
definition of equality is the heart of every Set question.
It depends on the implementation, and this is the single most important Set detail in interviews:
| Set | Duplicate test |
|---|---|
HashSet, LinkedHashSet |
hashCode() to find the bucket, then equals() within it |
TreeSet |
compareTo() (or the supplied Comparator) — equals is ignored |
// Two elements are "equal" in a TreeSet when compare returns 0:
Set<String> ci = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
ci.add("Java");
ci.add("JAVA"); // compare == 0 -> treated as duplicate, not added
ci.size(); // 1
The trap: a TreeSet uses comparison, not equals, so an element can be
dropped as a "duplicate" even though equals says it's different (and vice
versa) — keep your compareTo consistent with equals to avoid surprises.
A HashSet is just a thin wrapper around a HashMap — each element is
stored as a key, and all keys map to the same dummy PRESENT value object.
So everything you know about HashMap (buckets, load factor, treeified bins)
applies directly.
// Conceptually, inside HashSet:
private transient HashMap<E, Object> map;
private static final Object PRESENT = new Object();
public boolean add(E e) {
return map.put(e, PRESENT) == null; // true if key was new
}
That's why HashSet gives average O(1) add/contains/remove, has no
ordering, and depends entirely on good hashCode/equals. If hashes collide
badly, performance degrades toward O(n) (or O(log n) once a bucket treeifies).
LinkedHashSet extends HashSet but is backed by a LinkedHashMap, so it
keeps a doubly-linked list threading through the entries. The effect:
iteration follows insertion order while keeping HashSet's O(1) operations.
Set<String> hs = new HashSet<>();
hs.add("c"); hs.add("a"); hs.add("b");
// iteration order: unspecified (e.g. a, b, c)
Set<String> lhs = new LinkedHashSet<>();
lhs.add("c"); lhs.add("a"); lhs.add("b");
// iteration order: c, a, b — exactly as inserted
Cost is a slightly larger memory footprint (the extra prev/next links). Reach for it whenever you need dedup but predictable, reproducible ordering — e.g. removing duplicates from a list without scrambling it.
A TreeSet is backed by a TreeMap, a red-black tree (self-balancing
binary search tree). Elements are kept in sorted order — natural ordering
via Comparable, or a Comparator you pass to the constructor — and core
operations are O(log n) rather than O(1).
Set<Integer> nums = new TreeSet<>();
nums.add(5); nums.add(1); nums.add(3);
System.out.println(nums); // [1, 3, 5] — always sorted
// custom order:
Set<String> byLen = new TreeSet<>(Comparator.comparingInt(String::length));
Because it's a tree, you get range and neighbour queries for free
(first, last, floor, ceiling, headSet…). The trade-off vs HashSet:
slower point lookups but ordered iteration and rich navigation.
Elements must be mutually comparable — either they implement Comparable
(natural ordering) or you supply a Comparator. Without one of those, the very
first add of a non-comparable type throws ClassCastException at runtime.
class Point { int x, y; } // no Comparable, no Comparator
Set<Point> s = new TreeSet<>();
s.add(new Point()); // ClassCastException at runtime
// fix: give it an ordering
Set<Point> ok = new TreeSet<>(Comparator.comparingInt(p -> p.x));
ok.add(new Point()); // fine
Note the failure is not at construction time — an empty TreeSet is happy;
it only blows up when it needs to compare an element it can't order.
The classic side-by-side every interviewer expects:
| Feature | HashSet |
LinkedHashSet |
TreeSet |
|---|---|---|---|
| Backed by | HashMap |
LinkedHashMap |
TreeMap (red-black tree) |
| Ordering | none | insertion order | sorted (natural/comparator) |
add/contains/remove |
O(1) avg | O(1) avg | O(log n) |
null allowed |
one null |
one null |
no (NPE with natural order) |
Needs equals/hashCode |
yes | yes | uses compareTo/Comparator |
| Extra navigation | no | no | yes (first/floor/subSet…) |
// pick by need:
new HashSet<>(); // fastest, order doesn't matter
new LinkedHashSet<>(); // dedup, keep insertion order
new TreeSet<>(); // need sorted iteration or range queries
None of the three is thread-safe. Default to HashSet; upgrade only when
you specifically need order or sorting.
They return range views of a sorted set — not copies. headSet(to) is
everything below to, tailSet(from) is everything from from up, and
subSet(from, to) is the half-open range [from, to). The NavigableSet
overloads let you toggle the inclusive/exclusive bounds.
NavigableSet<Integer> s = new TreeSet<>(List.of(10, 20, 30, 40));
s.headSet(30); // [10, 20] (exclusive end)
s.tailSet(20); // [20, 30, 40] (inclusive start)
s.subSet(20, 40); // [20, 30] (half-open)
s.subSet(20, true, 40, true); // [20, 30, 40]
They are backed by the original set: changes flow both ways, and adding an
element outside the view's bounds throws IllegalArgumentException. Wrap in a
new TreeSet<>(view) if you need an independent snapshot.
Use descendingSet() (a reverse-ordered view) or descendingIterator().
Both walk the tree from largest to smallest without copying or re-sorting.
NavigableSet<Integer> s = new TreeSet<>(List.of(1, 2, 3));
for (int n : s.descendingSet()) {
System.out.print(n + " "); // 3 2 1
}
Iterator<Integer> it = s.descendingIterator();
while (it.hasNext()) { /* 3, then 2, then 1 */ }
Because descendingSet() is a live view, mutations on it affect the original
set and vice versa. It's the idiomatic alternative to constructing a
TreeSet with Comparator.reverseOrder() when you only need reverse
iteration occasionally.
No — with natural ordering, adding null throws NullPointerException,
because the set must call compareTo on the element and null.compareTo(...)
can't happen. (A HashSet/LinkedHashSet, by contrast, allows a single
null.)
Set<String> tree = new TreeSet<>();
tree.add(null); // NullPointerException
Set<String> hash = new HashSet<>();
hash.add(null); // fine — exactly one null allowed
Even a null-tolerant Comparator (e.g. Comparator.nullsFirst(...)) only
helps for non-first elements; the first add(null) on an empty natural-order
TreeSet still throws. Treat TreeSet as null-hostile.
EnumSet is a specialized Set for enum types only. Internally it's a
bit vector — a single long (or array of longs for big enums) where each
bit represents one constant. That makes operations near-instant and the memory
footprint tiny.
enum Day { MON, TUE, WED, THU, FRI, SAT, SUN }
EnumSet<Day> work = EnumSet.range(Day.MON, Day.FRI);
EnumSet<Day> weekend = EnumSet.complementOf(work); // [SAT, SUN]
EnumSet<Day> none = EnumSet.noneOf(Day.class);
EnumSet<Day> all = EnumSet.allOf(Day.class);
Iteration follows the enum's declaration order. It's created via factory
methods (of, range, allOf, noneOf, complementOf) rather than new.
Whenever your set elements are enum constants, EnumSet beats HashSet on
every axis — speed, memory, and clarity.
The standard HashSet/TreeSet/LinkedHashSet are not synchronized.
Options, from cheap to concurrent:
Collections.synchronizedSet(set)— wraps every method in a lock; you must still manually synchronize when iterating.ConcurrentHashMap.newKeySet()— a concurrent hash set backed byConcurrentHashMap; high-throughput, no external locking, weakly consistent iteration.CopyOnWriteArraySet— copies the backing array on every write; great for tiny, read-heavy sets, terrible for write-heavy ones.
Set<String> sync = Collections.synchronizedSet(new HashSet<>());
synchronized (sync) { // iteration needs the lock
for (String s : sync) { /* ... */ }
}
Set<String> concurrent = ConcurrentHashMap.newKeySet(); // preferred default
For most concurrent code, ConcurrentHashMap.newKeySet() is the right
choice — it scales far better than a globally locked wrapper.
CopyOnWriteArraySet (backed by a CopyOnWriteArrayList) makes a fresh copy
of the whole array on every mutation. Reads and iteration are lock-free and
see a stable snapshot, but each add/remove is O(n).
Set<Listener> listeners = new CopyOnWriteArraySet<>();
listeners.add(a); // copies the backing array
for (Listener l : listeners) { // iterates a snapshot — no CME, ever
l.onEvent(); // safe even if another thread mutates
}
It shines for small, read-mostly, rarely-mutated sets — the textbook case
being event-listener registries. Because it scans the array, contains is
O(n), so it's a poor fit for large or write-heavy sets, where
ConcurrentHashMap.newKeySet() wins.
Copy one set and addAll the other — duplicates are dropped automatically by
the Set contract.
Set<Integer> a = new HashSet<>(List.of(1, 2, 3));
Set<Integer> b = new HashSet<>(List.of(3, 4, 5));
Set<Integer> union = new HashSet<>(a);
union.addAll(b); // {1, 2, 3, 4, 5}
addAll is the union operator. Always copy first (new HashSet<>(a)) so
you don't mutate the original a — a common bug is calling a.addAll(b) and
silently changing the caller's set.
retainAll keeps only the elements present in both sets.
Set<Integer> a = new HashSet<>(List.of(1, 2, 3));
Set<Integer> b = new HashSet<>(List.of(3, 4, 5));
Set<Integer> intersection = new HashSet<>(a);
intersection.retainAll(b); // {3}
retainAll is the intersection operator. For performance, copy and iterate
the smaller set against the larger one — the cost is roughly
O(min(a, b)) lookups, and lookups are O(1) on a HashSet.
removeAll strips out every element that also appears in the other set,
leaving the difference (elements in a but not b).
Set<Integer> a = new HashSet<>(List.of(1, 2, 3));
Set<Integer> b = new HashSet<>(List.of(3, 4, 5));
Set<Integer> diff = new HashSet<>(a);
diff.removeAll(b); // {1, 2}
So the trio is: addAll = union, retainAll = intersection,
removeAll = difference. A symmetric difference (in either but not both) is
just union minus intersection, or two removeAlls combined.
A HashSet places each element in a bucket chosen by its hashCode() at
insertion time. If you then mutate a field that hashCode/equals depend
on, the element's hash changes but it stays in the old bucket — so the set
can no longer find it.
Set<Point> set = new HashSet<>();
Point p = new Point(1, 1);
set.add(p);
p.x = 99; // mutated a field used by hashCode
set.contains(p); // false — looks in the new bucket, finds nothing
// p is now a "ghost": present but unreachable; even remove() may miss it
That's why Set (and Map-key) elements should be immutable, or at least
never mutated on their equals/hashCode fields while stored. It's the same
reason String and the wrapper types — immutable — make perfect set elements.
Set.of(...) (Java 9+) returns a small, immutable set. Any mutating call
(add, remove, clear) throws UnsupportedOperationException, and — unlike
a normal add that silently ignores dups — passing a duplicate to the
factory throws IllegalArgumentException.
Set<String> s = Set.of("a", "b", "c");
s.add("d"); // UnsupportedOperationException
Set<String> dup = Set.of("a", "a"); // IllegalArgumentException at creation!
It also rejects null (NPE) and has an unspecified iteration order
that can vary between runs. Use it for compact, never-changing constant sets;
use LinkedHashSet/TreeSet when you need order or mutability.
Each implementation makes a different promise — knowing them prevents a class of "works on my machine" bugs:
| Set | Iteration order |
|---|---|
HashSet |
no guarantee (can even change between JVM runs) |
LinkedHashSet |
insertion order |
TreeSet |
sorted (natural or comparator) |
EnumSet |
enum declaration order |
Set.of(...) |
unspecified, may vary per run |
// never rely on HashSet order:
for (String s : new HashSet<>(List.of("a", "b", "c"))) { /* any order */ }
The rule of thumb: if your output or tests depend on order, don't use
HashSet — choose LinkedHashSet for insertion order or TreeSet for
sorted order.
HashSet.contains hashes the element, jumps straight to one bucket, and checks
only the few items there — average O(1). List.contains has no index into
its data, so it scans from the front comparing each element — O(n).
List<Integer> list = new ArrayList<>(/* 1_000_000 items */);
Set<Integer> set = new HashSet<>(list);
list.contains(999_999); // O(n) — walks ~a million elements
set.contains(999_999); // O(1) — one bucket lookup
This is why a frequent pattern is to dump a list into a HashSet first when
you'll do many membership checks. The rule of thumb: need fast "is it in
there?" — use a Set, not a List.
More Collections interview questions
More ways to practice
The self-quiz is live. Get notified when mock interviews and new question packs drop.