Java Set Explained — HashSet vs LinkedHashSet vs TreeSet

What a Set actually promises

A Set is a Collection with exactly one extra rule: no duplicate elements. There's no index and no positional access — the operation a set is built to make fast is membership (contains). Everything interesting about the three standard implementations flows from a single question: how does this set decide two elements are the same? Get that straight and the rest — ordering, null handling, complexity — falls out naturally.

Set<String> tags = new HashSet<>();
tags.add("java");      // true  — new element
tags.add("java");      // false — duplicate, silently ignored
tags.size();           // 1

The payoff is performance: a Set answers "is it in there?" far faster than a List, which has to scan element by element.

Dedup is defined by equals/hashCode — or compareTo

"Duplicate" is not reference identity; it's logical equality, and which equality depends on the implementation. Hash-based sets (HashSet, LinkedHashSet) use hashCode() to pick a bucket and equals() to compare within it. A TreeSet ignores both and uses compareTo() (or a supplied Comparator): two elements are "equal" precisely when comparison returns 0.

Set<String> hash = new HashSet<>();
hash.add("Java"); hash.add("JAVA");   // equals: different -> size 2

Set<String> tree = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
tree.add("Java"); tree.add("JAVA");   // compare == 0 -> size 1

This split is the most common interview trap. A TreeSet can drop an element as a "duplicate" even though equals says it differs, so the rule is: keep compareTo consistent with equals. Otherwise the same objects behave differently depending on which Set you happen to use.

HashSet — a HashMap wearing a disguise

HashSet is a thin wrapper around a HashMap. Each element is stored as a key, and every key maps to the same shared dummy value. That's the whole trick — so everything you know about HashMap (buckets, load factor, bins that treeify under collision pressure) applies unchanged.

// Conceptually, inside HashSet:
private transient HashMap<E, Object> map;
private static final Object PRESENT = new Object();

public boolean add(E e) {
    return map.put(e, PRESENT) == null;  // true only if the key was new
}

The consequences: average O(1) add/contains/remove, no ordering guarantee, and total dependence on a correct hashCode/equals. If hashes collide badly, performance slides toward O(n) — or O(log n) once an over-full bucket converts to a tree. This is your default set: reach for anything else only when you need something it can't give you.

LinkedHashSet — dedup that remembers insertion order

LinkedHashSet extends HashSet but is backed by a LinkedHashMap, which threads a doubly-linked list through the entries. You keep HashSet's O(1) operations, but iteration now follows insertion order — deterministic and reproducible.

Set<String> hs = new LinkedHashSet<>();
hs.add("c"); hs.add("a"); hs.add("b");
hs.add("a");                        // duplicate ignored, order untouched
System.out.println(hs);            // [c, a, b] — exactly as inserted

The only cost is a slightly larger footprint (the prev/next links per entry). Use it whenever you need dedup without scrambling order — the textbook case is stripping duplicates from a list while preserving the original sequence, something plain HashSet would quietly mangle.

TreeSet — sorted, red-black, and navigable

TreeSet is backed by a TreeMap, a red-black tree (a self-balancing binary search tree). Elements stay in sorted order — natural ordering via Comparable, or a Comparator you pass to the constructor — and core operations cost O(log n) instead of O(1). In exchange you get range and neighbour queries for free, because the data is already ordered.

NavigableSet<Integer> s = new TreeSet<>(List.of(10, 20, 30, 40));
s.first();          // 10        smallest
s.last();           // 40        largest
s.floor(25);        // 20        greatest <= 25
s.ceiling(25);      // 30        smallest >= 25
s.higher(20);       // 30        strictly > 20
s.descendingSet();  // [40, 30, 20, 10]  reverse view, no copy

TreeSet implements NavigableSet (which extends SortedSet), so you also get live headSet/tailSet/subSet range views backed by the original set. Declare your variable as NavigableSet rather than SortedSet to keep floor/ceiling/higher/ lower in reach — they're the whole reason to prefer a tree over a hash set for "nearest neighbour" problems.

The TreeSet null and Comparable gotcha

A TreeSet must compare every element it stores, and that creates two failure modes people hit at runtime, not compile time. First, null: with natural ordering, add(null) throws NullPointerException because the set tries to call null.compareTo. Second, non-comparable elements: a type with no Comparable and no Comparator blows up with ClassCastException on the first real add.

Set<String> tree = new TreeSet<>();
tree.add(null);                    // NullPointerException (HashSet would allow one null)

class Point { int x, y; }
Set<Point> bad = new TreeSet<>();
bad.add(new Point());              // ClassCastException — Point isn't Comparable

Set<Point> ok = new TreeSet<>(Comparator.comparingInt(p -> p.x));
ok.add(new Point());               // fine — an ordering was supplied

Note both failures are deferred: an empty TreeSet is perfectly happy. It only throws when it must compare. Treat TreeSet as null-hostile and always give it an ordering for custom types.

Choosing between the three

Here's the side-by-side every interviewer wants to see:

Feature	`HashSet`	`LinkedHashSet`	`TreeSet`
Backed by	`HashMap`	`LinkedHashMap`	`TreeMap` (red-black tree)
Iteration order	none	insertion order	sorted
`add`/`contains`/`remove`	O(1) avg	O(1) avg	O(log n)
`null` allowed	one	one	no (natural order)
Equality test	`equals`+`hashCode`	`equals`+`hashCode`	`compareTo`/`Comparator`
Range/navigation	no	no	yes

new HashSet<>();        // fastest, order irrelevant — the default
new LinkedHashSet<>();  // dedup but keep insertion order
new TreeSet<>();        // need sorted iteration or range queries

None of the three is thread-safe. Default to HashSet and upgrade only when you specifically need predictable order, sorting, or navigation.

EnumSet — the bit-vector specialist

When your elements are enum constants, none of the above is the right answer: EnumSet is. Internally it's a bit vector — usually a single long where each bit represents one constant — so operations are near-instant and memory is tiny. You build it with factory methods, not new, and iteration follows the enum's declaration order.

enum Day { MON, TUE, WED, THU, FRI, SAT, SUN }

EnumSet<Day> work    = EnumSet.range(Day.MON, Day.FRI);   // [MON..FRI]
EnumSet<Day> weekend = EnumSet.complementOf(work);        // [SAT, SUN]
EnumSet<Day> none    = EnumSet.noneOf(Day.class);
EnumSet<Day> all     = EnumSet.allOf(Day.class);

For enum keys, EnumSet beats HashSet on every axis — speed, memory, and readability — so make it your reflex whenever the element type is an enum.

Thread-safe and concurrent sets

The standard implementations are unsynchronized. There are three escalating options. Collections.synchronizedSet(...) wraps every method in a lock but still requires you to manually synchronize during iteration. ConcurrentHashMap.newKeySet() gives a true concurrent hash set with lock-free reads and weakly consistent iteration — the right default for concurrent code. CopyOnWriteArraySet copies its whole backing array on every write, so it's perfect for small, read-mostly sets (think listener registries) and terrible for write-heavy ones.

Set<String> sync = Collections.synchronizedSet(new HashSet<>());
synchronized (sync) {                       // iteration STILL needs the lock
    for (String x : sync) { /* ... */ }
}

Set<String> concurrent = ConcurrentHashMap.newKeySet();   // preferred default
Set<Listener> listeners = new CopyOnWriteArraySet<>();    // read-mostly, tiny

The key insight: a globally locked wrapper serializes every access, while ConcurrentHashMap.newKeySet() scales across threads. Pick the wrapper only for quick, low-contention cases.

Set algebra — union, intersection, difference

The three classic set operations map directly onto bulk Collection methods: addAll is union, retainAll is intersection, removeAll is difference. The one discipline to remember is to copy first, so you don't silently mutate the caller's set.

Set<Integer> a = new HashSet<>(List.of(1, 2, 3));
Set<Integer> b = new HashSet<>(List.of(3, 4, 5));

Set<Integer> union = new HashSet<>(a); union.addAll(b);          // {1,2,3,4,5}
Set<Integer> inter = new HashSet<>(a); inter.retainAll(b);       // {3}
Set<Integer> diff  = new HashSet<>(a); diff.removeAll(b);        // {1,2}

For intersection, iterate the smaller set against the larger — the cost is roughly O(min(a, b)) O(1) lookups. A symmetric difference (in either but not both) is just union minus intersection.

The mutable-element pitfall

A HashSet places each element in the bucket chosen by its hashCode() at insertion time. If you later mutate a field that hashCode/equals depend on, the element's hash changes but it stays in its old bucket — and the set can no longer find it. The same logic breaks TreeSet if you mutate a field used by compareTo.

Set<Point> set = new HashSet<>();
Point p = new Point(1, 1);
set.add(p);
p.x = 99;                  // mutated a field used by hashCode

set.contains(p);           // false — looks in the new bucket, finds nothing
// p is now a "ghost": present but unreachable; even remove() may miss it

This is why set elements (and map keys) should be immutable, or at least never mutated on their equality fields while stored. It's exactly why String and the wrapper types — immutable by design — make flawless set elements.

Recap

A Set is defined by one rule, no duplicates, and the whole topic hinges on how duplication is judged: equals/hashCode for the hash-based sets, compareTo for the tree. HashSet (HashMap-backed, O(1), unordered) is your default; LinkedHashSet adds insertion order for the price of a few links; TreeSet (red-black tree, O(log n)) gives sorted order plus NavigableSet navigation, but is null-hostile and demands a Comparator for custom types. Reach for EnumSet with enum keys and ConcurrentHashMap.newKeySet() under concurrency. Compose sets with addAll/ retainAll/removeAll, and keep elements immutable so they never go missing.

More ways to practice