What a Set actually promises
A Set is a Collection with exactly one extra rule: no duplicate elements. There's
no index and no positional access — the operation a set is built to make fast is
membership (contains). Everything interesting about the three standard
implementations flows from a single question: how does this set decide two elements are
the same? Get that straight and the rest — ordering, null handling, complexity — falls
out naturally.
Set<String> tags = new HashSet<>();
tags.add("java"); // true — new element
tags.add("java"); // false — duplicate, silently ignored
tags.size(); // 1
The payoff is performance: a Set answers "is it in there?" far faster than a List,
which has to scan element by element.
Dedup is defined by equals/hashCode — or compareTo
"Duplicate" is not reference identity; it's logical equality, and which equality
depends on the implementation. Hash-based sets (HashSet, LinkedHashSet) use
hashCode() to pick a bucket and equals() to compare within it. A TreeSet
ignores both and uses compareTo() (or a supplied Comparator): two elements are
"equal" precisely when comparison returns 0.
Set<String> hash = new HashSet<>();
hash.add("Java"); hash.add("JAVA"); // equals: different -> size 2
Set<String> tree = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
tree.add("Java"); tree.add("JAVA"); // compare == 0 -> size 1
This split is the most common interview trap. A TreeSet can drop an element as a
"duplicate" even though equals says it differs, so the rule is: keep compareTo
consistent with equals. Otherwise the same objects behave differently depending on
which Set you happen to use.
HashSet — a HashMap wearing a disguise
HashSet is a thin wrapper around a HashMap. Each element is stored as a key, and
every key maps to the same shared dummy value. That's the whole trick — so everything you
know about HashMap (buckets, load factor, bins that treeify under collision pressure)
applies unchanged.
// Conceptually, inside HashSet:
private transient HashMap<E, Object> map;
private static final Object PRESENT = new Object();
public boolean add(E e) {
return map.put(e, PRESENT) == null; // true only if the key was new
}
The consequences: average O(1) add/contains/remove, no ordering guarantee,
and total dependence on a correct hashCode/equals. If hashes collide badly,
performance slides toward O(n) — or O(log n) once an over-full bucket converts to a tree.
This is your default set: reach for anything else only when you need something it can't
give you.
LinkedHashSet — dedup that remembers insertion order
LinkedHashSet extends HashSet but is backed by a LinkedHashMap, which threads a
doubly-linked list through the entries. You keep HashSet's O(1) operations, but
iteration now follows insertion order — deterministic and reproducible.
Set<String> hs = new LinkedHashSet<>();
hs.add("c"); hs.add("a"); hs.add("b");
hs.add("a"); // duplicate ignored, order untouched
System.out.println(hs); // [c, a, b] — exactly as inserted
The only cost is a slightly larger footprint (the prev/next links per entry). Use it
whenever you need dedup without scrambling order — the textbook case is stripping
duplicates from a list while preserving the original sequence, something plain HashSet
would quietly mangle.
TreeSet — sorted, red-black, and navigable
TreeSet is backed by a TreeMap, a red-black tree (a self-balancing binary
search tree). Elements stay in sorted order — natural ordering via Comparable, or a
Comparator you pass to the constructor — and core operations cost O(log n) instead
of O(1). In exchange you get range and neighbour queries for free, because the data is
already ordered.
NavigableSet<Integer> s = new TreeSet<>(List.of(10, 20, 30, 40));
s.first(); // 10 smallest
s.last(); // 40 largest
s.floor(25); // 20 greatest <= 25
s.ceiling(25); // 30 smallest >= 25
s.higher(20); // 30 strictly > 20
s.descendingSet(); // [40, 30, 20, 10] reverse view, no copy
TreeSet implements NavigableSet (which extends SortedSet), so you also get live
headSet/tailSet/subSet range views backed by the original set. Declare your
variable as NavigableSet rather than SortedSet to keep floor/ceiling/higher/
lower in reach — they're the whole reason to prefer a tree over a hash set for
"nearest neighbour" problems.
The TreeSet null and Comparable gotcha
A TreeSet must compare every element it stores, and that creates two failure modes
people hit at runtime, not compile time. First, null: with natural ordering,
add(null) throws NullPointerException because the set tries to call null.compareTo.
Second, non-comparable elements: a type with no Comparable and no Comparator blows
up with ClassCastException on the first real add.
Set<String> tree = new TreeSet<>();
tree.add(null); // NullPointerException (HashSet would allow one null)
class Point { int x, y; }
Set<Point> bad = new TreeSet<>();
bad.add(new Point()); // ClassCastException — Point isn't Comparable
Set<Point> ok = new TreeSet<>(Comparator.comparingInt(p -> p.x));
ok.add(new Point()); // fine — an ordering was supplied
Note both failures are deferred: an empty TreeSet is perfectly happy. It only throws
when it must compare. Treat TreeSet as null-hostile and always give it an ordering
for custom types.
Choosing between the three
Here's the side-by-side every interviewer wants to see:
| Feature | HashSet | LinkedHashSet | TreeSet |
|---|---|---|---|
| Backed by | HashMap | LinkedHashMap | TreeMap (red-black tree) |
| Iteration order | none | insertion order | sorted |
add/contains/remove | O(1) avg | O(1) avg | O(log n) |
null allowed | one | one | no (natural order) |
| Equality test | equals+hashCode | equals+hashCode | compareTo/Comparator |
| Range/navigation | no | no | yes |
new HashSet<>(); // fastest, order irrelevant — the default
new LinkedHashSet<>(); // dedup but keep insertion order
new TreeSet<>(); // need sorted iteration or range queries
None of the three is thread-safe. Default to HashSet and upgrade only when you
specifically need predictable order, sorting, or navigation.
EnumSet — the bit-vector specialist
When your elements are enum constants, none of the above is the right answer:
EnumSet is. Internally it's a bit vector — usually a single long where each bit
represents one constant — so operations are near-instant and memory is tiny. You build it
with factory methods, not new, and iteration follows the enum's declaration order.
enum Day { MON, TUE, WED, THU, FRI, SAT, SUN }
EnumSet<Day> work = EnumSet.range(Day.MON, Day.FRI); // [MON..FRI]
EnumSet<Day> weekend = EnumSet.complementOf(work); // [SAT, SUN]
EnumSet<Day> none = EnumSet.noneOf(Day.class);
EnumSet<Day> all = EnumSet.allOf(Day.class);
For enum keys, EnumSet beats HashSet on every axis — speed, memory, and readability —
so make it your reflex whenever the element type is an enum.
Thread-safe and concurrent sets
The standard implementations are unsynchronized. There are three escalating options.
Collections.synchronizedSet(...) wraps every method in a lock but still requires you to
manually synchronize during iteration. ConcurrentHashMap.newKeySet() gives a true
concurrent hash set with lock-free reads and weakly consistent iteration — the right
default for concurrent code. CopyOnWriteArraySet copies its whole backing array on every
write, so it's perfect for small, read-mostly sets (think listener registries) and
terrible for write-heavy ones.
Set<String> sync = Collections.synchronizedSet(new HashSet<>());
synchronized (sync) { // iteration STILL needs the lock
for (String x : sync) { /* ... */ }
}
Set<String> concurrent = ConcurrentHashMap.newKeySet(); // preferred default
Set<Listener> listeners = new CopyOnWriteArraySet<>(); // read-mostly, tiny
The key insight: a globally locked wrapper serializes every access, while
ConcurrentHashMap.newKeySet() scales across threads. Pick the wrapper only for quick,
low-contention cases.
Set algebra — union, intersection, difference
The three classic set operations map directly onto bulk Collection methods: addAll
is union, retainAll is intersection, removeAll is difference. The one discipline to
remember is to copy first, so you don't silently mutate the caller's set.
Set<Integer> a = new HashSet<>(List.of(1, 2, 3));
Set<Integer> b = new HashSet<>(List.of(3, 4, 5));
Set<Integer> union = new HashSet<>(a); union.addAll(b); // {1,2,3,4,5}
Set<Integer> inter = new HashSet<>(a); inter.retainAll(b); // {3}
Set<Integer> diff = new HashSet<>(a); diff.removeAll(b); // {1,2}
For intersection, iterate the smaller set against the larger — the cost is roughly O(min(a, b)) O(1) lookups. A symmetric difference (in either but not both) is just union minus intersection.
The mutable-element pitfall
A HashSet places each element in the bucket chosen by its hashCode() at insertion
time. If you later mutate a field that hashCode/equals depend on, the element's hash
changes but it stays in its old bucket — and the set can no longer find it. The same logic
breaks TreeSet if you mutate a field used by compareTo.
Set<Point> set = new HashSet<>();
Point p = new Point(1, 1);
set.add(p);
p.x = 99; // mutated a field used by hashCode
set.contains(p); // false — looks in the new bucket, finds nothing
// p is now a "ghost": present but unreachable; even remove() may miss it
This is why set elements (and map keys) should be immutable, or at least never mutated
on their equality fields while stored. It's exactly why String and the wrapper types —
immutable by design — make flawless set elements.
Recap
A Set is defined by one rule, no duplicates, and the whole topic hinges on how
duplication is judged: equals/hashCode for the hash-based sets, compareTo for
the tree. HashSet (HashMap-backed, O(1), unordered) is your default; LinkedHashSet
adds insertion order for the price of a few links; TreeSet (red-black tree, O(log n))
gives sorted order plus NavigableSet navigation, but is null-hostile and demands a
Comparator for custom types. Reach for EnumSet with enum keys and
ConcurrentHashMap.newKeySet() under concurrency. Compose sets with addAll/
retainAll/removeAll, and keep elements immutable so they never go missing.