Why HashMap internals come up so often
HashMap is the data structure interviewers reach for when they want to test whether you
understand a system rather than just an API. On the surface it is a key-value store with
average O(1) get and put. Underneath it is a careful balancing act between speed,
memory, and worst-case behavior — bit tricks for indexing, a resize policy tuned to a
magic constant, and a fallback tree for pathological inputs. This guide walks the whole
machine end to end so you can explain not just what HashMap does but why each
decision exists.
The bucket array and the Node
A HashMap is built around a single field: an array of buckets, Node<K,V>[] table. Each
slot holds the head of a chain of entries that landed on the same index. A Node
packs four things — the key's already-computed hash, the key, the value, and a
next pointer that links it to the following entry in the same bucket.
static class Node<K,V> {
final int hash; // cached spread hash, so resize never recomputes it
final K key;
V value;
Node<K,V> next; // singly linked chain within one bucket
}
Caching the hash in the node is a small but important optimization: lookups compare the
cheap int hash before ever calling equals, and resizing can relocate entries without
re-hashing keys. Everything else in HashMap is machinery for choosing which bucket a
node lives in and keeping the chains short.
Hashing and the spread function
The bucket index is derived from the key's hashCode(), but HashMap never uses it raw.
Because indexing keeps only the low bits of the hash (more on that below), two keys
that differ only in their high bits would collide constantly. To defend against weak
hashCode implementations, HashMap applies a spread (or perturbation) function
that mixes the high bits down into the low ones.
static final int hash(Object key) {
int h;
// XOR the hash with its own top 16 bits — one shift, one XOR
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
This costs a single shift and XOR yet meaningfully improves distribution for hashes whose
entropy lives up high. Notice the spread also defines null's hash as 0, which is why a
null key always lands in bucket 0 — no special branch needed later.
Turning a hash into a bucket index
With a spread hash in hand, HashMap finds the bucket with index = (n - 1) & hash,
where n is the table length. This is the same result as hash % n but far cheaper,
because a bitwise AND is a single CPU instruction while modulo is not.
int n = 16; // table length
int idx = (n - 1) & hash; // 15 & hash -> keeps the low 4 bits
The trick only works because n is a power of two. When n is a power of two, n - 1 is
a clean run of all-ones in the low bits (0b1111 for 16), so the AND keeps a contiguous
slice of the hash and every bucket is reachable. This single line is the reason for the
power-of-two invariant that shows up everywhere else.
Why capacity is always a power of two
HashMap will never let you create a table whose capacity is not a power of two. If you
ask for 10, you get 16; ask for 20, you get 32. The constructor runs tableSizeFor to
round your request up to the next power of two before allocating anything.
new HashMap<>(10); // actual capacity becomes 16
new HashMap<>(20); // becomes 32
The motivation is purely the (n - 1) & hash indexing. A power-of-two size guarantees the
mask is gap-free, so entries spread evenly and the AND substitutes cleanly for modulo. A
non-power-of-two size would leave holes in the mask, making some buckets unreachable and
clustering entries into the rest — exactly what a hash table is supposed to avoid.
Load factor, threshold, and resizing
A HashMap starts at capacity 16 with a load factor of 0.75. The load factor is
the fullness fraction at which the table grows: the threshold is capacity × loadFactor,
so a default map resizes once it holds more than 16 × 0.75 = 12 entries. When size
crosses the threshold, HashMap doubles the capacity and rehashes every entry into
the bigger table.
new HashMap<>(); // capacity 16, threshold 12
new HashMap<>(32, 0.5f); // capacity 32, threshold 16
// the Java 8 split: doubling means each entry either stays at index i
// or moves to i + oldCapacity, decided by one new high bit of its hash
if ((node.hash & oldCapacity) == 0) { /* stays in the "low" bucket */ }
else { /* moves to the "high" bucket */ }
0.75 is a deliberate time/space trade-off — lower factors waste memory but shorten
chains, higher factors save memory but lengthen them. Doubling preserves the power-of-two
invariant, and because only one new bit decides an entry's fate, Java 8 splits each old
chain into two ordered sub-chains without recomputing indexes. A resize is still O(n),
which is why presizing large maps matters.
Collisions and treeification
Distinct keys can map to the same bucket even with perfect hashCodes, simply because many
hashes fold into a small index space. HashMap handles these collisions by chaining:
originally a bucket is a singly linked list. Java 8 added a safety valve — when one bucket's
chain grows too long, it is converted into a red-black tree so lookups within it stay
logarithmic instead of linear.
bucket[5] -> Node(k1) -> Node(k2) -> Node(k3) // linked list
// after treeification:
bucket[5] -> (balanced red-black tree of TreeNodes)
Two thresholds govern this. A chain treeifies at TREEIFY_THRESHOLD = 8, but only if
the table is already at least MIN_TREEIFY_CAPACITY = 64; below that, a long chain
usually means the table is just too small, so HashMap resizes instead. A tree reverts
to a list once it shrinks to UNTREEIFY_THRESHOLD = 6. The gap between 8 and 6 is
deliberate hysteresis that prevents flip-flopping on repeated add/remove.
The put and get flow
put and get share the same lookup half. Both spread the key's hash, compute
(n - 1) & hash, and then walk the chain. For each node they first compare the cached
hash (a cheap int check that short-circuits most mismatches) and only then call
equals to confirm the key.
map.put("a", 1); // empty bucket -> drop node in
map.put("a", 2); // same key found by hash + equals -> value replaced, returns 1
map.put("b", 3); // collision -> append to chain (treeify if chain hits 8 and table >= 64)
map.get("a"); // hash -> index -> walk chain -> return matching node's value
map.get("zzz"); // no match -> null
put adds a node when the chain has no matching key and then bumps size, triggering a
resize if the threshold is crossed. get is just the lookup with no mutation. In a list
bucket both are O(chain length); in a treeified bucket they are O(log n). The match
step is why equals is non-negotiable: landing in the right bucket does not mean you
found the right key.
The hashCode/equals contract
HashMap leans directly on the hashCode/equals contract: it uses hashCode to
find the bucket and equals to find the key inside it. The contract says that if
a.equals(b), then a.hashCode() == b.hashCode() must hold (the reverse need not — unequal
objects may share a hash). Break it by overriding only one method and a key you stored
becomes unfindable.
class Point {
final int x, y;
@Override public boolean equals(Object o) {
return o instanceof Point p && x == p.x && y == p.y;
}
@Override public int hashCode() { return Objects.hash(x, y); } // always pair these
}
A worse failure is a hashCode that returns a constant — return 1; for every key sends
all entries into one bucket, collapsing the map to a single giant chain and making every
operation O(n). Treeification softens this to O(log n), but only when the keys are
also Comparable; otherwise the tree falls back to identity-hash ordering. The real cure
is always a well-distributed hashCode. Related: keys should be immutable, because a
key's hash is computed at insertion — mutate a field that affects hashCode afterward and
the entry becomes a ghost, present in the table but lost in the wrong bucket.
Null handling and fail-fast iterators
HashMap permits one null key (special-cased to hash 0, so it sits in bucket 0) and
any number of null values. The catch is that get returns null both for an absent
key and a key explicitly mapped to null — use containsKey to distinguish them.
Iterators are fail-fast: they track a modCount and throw
ConcurrentModificationException if the map is structurally modified outside the iterator
during iteration.
Map<String,String> m = new HashMap<>();
m.put(null, "x"); // legal — single null key
m.put("a", null); // legal — null value
m.containsKey("a"); // true, even though the value is null
// safe structural mutation while iterating:
Iterator<String> it = m.keySet().iterator();
while (it.hasNext()) { if (it.next().startsWith("x")) it.remove(); }
Fail-fast is a best-effort bug detector, not a guarantee — never write logic that
depends on the exception firing. To mutate while iterating, use iterator.remove(),
collect-then-remove, or removeIf.
Thread safety and ConcurrentHashMap
HashMap has no synchronization, so concurrent writes can interleave and corrupt the
table, lose updates, or report a wrong size. Pre-Java-8 JDKs were also vulnerable to an
infamous infinite loop on resize — two threads resizing at once could splice a chain
into a cycle, and a later get would spin at 100% CPU forever. Java 8's order-preserving
split removed that specific cycle, but HashMap is still not thread-safe. For shared
maps, reach for ConcurrentHashMap.
ConcurrentHashMap<String,Long> counts = new ConcurrentHashMap<>();
counts.merge("hit", 1L, Long::sum); // atomic increment, no external lock
Modern ConcurrentHashMap uses the same Node[] table as HashMap but adds per-bin
locking plus CAS: an empty bin is filled with a lock-free compare-and-swap, while a
non-empty bin is updated under a synchronized block on its first node — so contention is
limited to the keys colliding in that one bin. Reads are non-blocking (volatile fields),
resizes are cooperative across threads, and its iterators are weakly consistent (they
never throw ConcurrentModificationException). This is why it scales far past the legacy
Hashtable, which wraps every method in synchronized and serializes all access.
HashMap vs LinkedHashMap vs TreeMap
All three implement Map; they differ in iteration order and complexity. Pick the cheapest
one that satisfies your ordering needs.
| Map | Ordering | get/put | Backing structure |
|---|---|---|---|
HashMap | none (arbitrary) | O(1) avg | hash table |
LinkedHashMap | insertion or access order | O(1) avg | hash table + linked list |
TreeMap | sorted by key | O(log n) | red-black tree |
new HashMap<>(); // fastest, no order guarantee
new LinkedHashMap<>(); // predictable iteration; access-order mode powers LRU caches
new TreeMap<>(); // keys come out sorted; needs Comparable or a Comparator
Default to HashMap. Choose LinkedHashMap when you need stable iteration order or an
LRU cache (access-order mode plus removeEldestEntry), and TreeMap when you need
sorted keys or range queries like firstKey, headMap, and subMap. The same pattern
extends to sets: HashSet, LinkedHashSet, and TreeSet are thin wrappers over their
matching maps.
Recap
HashMap stores entries in a power-of-two bucket array of Nodes, indexes them with
the bit trick (n - 1) & hash after a cheap spread function mixes the high bits
down, and keeps chains short by resizing at capacity × 0.75. Collisions chain into
linked lists that treeify into red-black trees past 8 entries (with a table of 64+),
bounding the worst case at O(log n). The whole thing rests on a sound
hashCode/equals contract and immutable keys — break either and lookups silently
fail or degrade to O(n). HashMap allows one null key and many null values but is
not thread-safe; use ConcurrentHashMap with its per-bin locking and CAS for shared
access, and pick LinkedHashMap or TreeMap when you need ordering. Understand these
moving parts and you can answer almost any HashMap question from first principles.