How Java HashMap Works Internally — Buckets, Hashing, Resizing & Treeification

Why HashMap internals come up so often

HashMap is the data structure interviewers reach for when they want to test whether you understand a system rather than just an API. On the surface it is a key-value store with average O(1) get and put. Underneath it is a careful balancing act between speed, memory, and worst-case behavior — bit tricks for indexing, a resize policy tuned to a magic constant, and a fallback tree for pathological inputs. This guide walks the whole machine end to end so you can explain not just what HashMap does but why each decision exists.

The bucket array and the Node

A HashMap is built around a single field: an array of buckets, Node<K,V>[] table. Each slot holds the head of a chain of entries that landed on the same index. A Node packs four things — the key's already-computed hash, the key, the value, and a next pointer that links it to the following entry in the same bucket.

static class Node<K,V> {
  final int hash;   // cached spread hash, so resize never recomputes it
  final K key;
  V value;
  Node<K,V> next;   // singly linked chain within one bucket
}

Caching the hash in the node is a small but important optimization: lookups compare the cheap int hash before ever calling equals, and resizing can relocate entries without re-hashing keys. Everything else in HashMap is machinery for choosing which bucket a node lives in and keeping the chains short.

Hashing and the spread function

The bucket index is derived from the key's hashCode(), but HashMap never uses it raw. Because indexing keeps only the low bits of the hash (more on that below), two keys that differ only in their high bits would collide constantly. To defend against weak hashCode implementations, HashMap applies a spread (or perturbation) function that mixes the high bits down into the low ones.

static final int hash(Object key) {
  int h;
  // XOR the hash with its own top 16 bits — one shift, one XOR
  return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

This costs a single shift and XOR yet meaningfully improves distribution for hashes whose entropy lives up high. Notice the spread also defines null's hash as 0, which is why a null key always lands in bucket 0 — no special branch needed later.

Turning a hash into a bucket index

With a spread hash in hand, HashMap finds the bucket with index = (n - 1) & hash, where n is the table length. This is the same result as hash % n but far cheaper, because a bitwise AND is a single CPU instruction while modulo is not.

int n = 16;                // table length
int idx = (n - 1) & hash;  // 15 & hash  ->  keeps the low 4 bits

The trick only works because n is a power of two. When n is a power of two, n - 1 is a clean run of all-ones in the low bits (0b1111 for 16), so the AND keeps a contiguous slice of the hash and every bucket is reachable. This single line is the reason for the power-of-two invariant that shows up everywhere else.

Why capacity is always a power of two

HashMap will never let you create a table whose capacity is not a power of two. If you ask for 10, you get 16; ask for 20, you get 32. The constructor runs tableSizeFor to round your request up to the next power of two before allocating anything.

new HashMap<>(10);  // actual capacity becomes 16
new HashMap<>(20);  // becomes 32

The motivation is purely the (n - 1) & hash indexing. A power-of-two size guarantees the mask is gap-free, so entries spread evenly and the AND substitutes cleanly for modulo. A non-power-of-two size would leave holes in the mask, making some buckets unreachable and clustering entries into the rest — exactly what a hash table is supposed to avoid.

Load factor, threshold, and resizing

A HashMap starts at capacity 16 with a load factor of 0.75. The load factor is the fullness fraction at which the table grows: the threshold is capacity × loadFactor, so a default map resizes once it holds more than 16 × 0.75 = 12 entries. When size crosses the threshold, HashMap doubles the capacity and rehashes every entry into the bigger table.

new HashMap<>();          // capacity 16, threshold 12
new HashMap<>(32, 0.5f);  // capacity 32, threshold 16

// the Java 8 split: doubling means each entry either stays at index i
// or moves to i + oldCapacity, decided by one new high bit of its hash
if ((node.hash & oldCapacity) == 0) { /* stays in the "low" bucket */ }
else                                { /* moves to the "high" bucket */ }

0.75 is a deliberate time/space trade-off — lower factors waste memory but shorten chains, higher factors save memory but lengthen them. Doubling preserves the power-of-two invariant, and because only one new bit decides an entry's fate, Java 8 splits each old chain into two ordered sub-chains without recomputing indexes. A resize is still O(n), which is why presizing large maps matters.

Collisions and treeification

Distinct keys can map to the same bucket even with perfect hashCodes, simply because many hashes fold into a small index space. HashMap handles these collisions by chaining: originally a bucket is a singly linked list. Java 8 added a safety valve — when one bucket's chain grows too long, it is converted into a red-black tree so lookups within it stay logarithmic instead of linear.

bucket[5] -> Node(k1) -> Node(k2) -> Node(k3)   // linked list

// after treeification:
bucket[5] -> (balanced red-black tree of TreeNodes)

Two thresholds govern this. A chain treeifies at TREEIFY_THRESHOLD = 8, but only if the table is already at least MIN_TREEIFY_CAPACITY = 64; below that, a long chain usually means the table is just too small, so HashMap resizes instead. A tree reverts to a list once it shrinks to UNTREEIFY_THRESHOLD = 6. The gap between 8 and 6 is deliberate hysteresis that prevents flip-flopping on repeated add/remove.

The put and get flow

put and get share the same lookup half. Both spread the key's hash, compute (n - 1) & hash, and then walk the chain. For each node they first compare the cached hash (a cheap int check that short-circuits most mismatches) and only then call equals to confirm the key.

map.put("a", 1);  // empty bucket -> drop node in
map.put("a", 2);  // same key found by hash + equals -> value replaced, returns 1
map.put("b", 3);  // collision -> append to chain (treeify if chain hits 8 and table >= 64)

map.get("a");     // hash -> index -> walk chain -> return matching node's value
map.get("zzz");   // no match -> null

put adds a node when the chain has no matching key and then bumps size, triggering a resize if the threshold is crossed. get is just the lookup with no mutation. In a list bucket both are O(chain length); in a treeified bucket they are O(log n). The match step is why equals is non-negotiable: landing in the right bucket does not mean you found the right key.

The hashCode/equals contract

HashMap leans directly on the hashCode/equals contract: it uses hashCode to find the bucket and equals to find the key inside it. The contract says that if a.equals(b), then a.hashCode() == b.hashCode() must hold (the reverse need not — unequal objects may share a hash). Break it by overriding only one method and a key you stored becomes unfindable.

class Point {
  final int x, y;
  @Override public boolean equals(Object o) {
    return o instanceof Point p && x == p.x && y == p.y;
  }
  @Override public int hashCode() { return Objects.hash(x, y); }  // always pair these
}

A worse failure is a hashCode that returns a constant — return 1; for every key sends all entries into one bucket, collapsing the map to a single giant chain and making every operation O(n). Treeification softens this to O(log n), but only when the keys are also Comparable; otherwise the tree falls back to identity-hash ordering. The real cure is always a well-distributed hashCode. Related: keys should be immutable, because a key's hash is computed at insertion — mutate a field that affects hashCode afterward and the entry becomes a ghost, present in the table but lost in the wrong bucket.

Null handling and fail-fast iterators

HashMap permits one null key (special-cased to hash 0, so it sits in bucket 0) and any number of null values. The catch is that get returns null both for an absent key and a key explicitly mapped to null — use containsKey to distinguish them. Iterators are fail-fast: they track a modCount and throw ConcurrentModificationException if the map is structurally modified outside the iterator during iteration.

Map<String,String> m = new HashMap<>();
m.put(null, "x");   // legal — single null key
m.put("a", null);   // legal — null value
m.containsKey("a"); // true, even though the value is null

// safe structural mutation while iterating:
Iterator<String> it = m.keySet().iterator();
while (it.hasNext()) { if (it.next().startsWith("x")) it.remove(); }

Fail-fast is a best-effort bug detector, not a guarantee — never write logic that depends on the exception firing. To mutate while iterating, use iterator.remove(), collect-then-remove, or removeIf.

Thread safety and ConcurrentHashMap

HashMap has no synchronization, so concurrent writes can interleave and corrupt the table, lose updates, or report a wrong size. Pre-Java-8 JDKs were also vulnerable to an infamous infinite loop on resize — two threads resizing at once could splice a chain into a cycle, and a later get would spin at 100% CPU forever. Java 8's order-preserving split removed that specific cycle, but HashMap is still not thread-safe. For shared maps, reach for ConcurrentHashMap.

ConcurrentHashMap<String,Long> counts = new ConcurrentHashMap<>();
counts.merge("hit", 1L, Long::sum);   // atomic increment, no external lock

Modern ConcurrentHashMap uses the same Node[] table as HashMap but adds per-bin locking plus CAS: an empty bin is filled with a lock-free compare-and-swap, while a non-empty bin is updated under a synchronized block on its first node — so contention is limited to the keys colliding in that one bin. Reads are non-blocking (volatile fields), resizes are cooperative across threads, and its iterators are weakly consistent (they never throw ConcurrentModificationException). This is why it scales far past the legacy Hashtable, which wraps every method in synchronized and serializes all access.

HashMap vs LinkedHashMap vs TreeMap

All three implement Map; they differ in iteration order and complexity. Pick the cheapest one that satisfies your ordering needs.

Map	Ordering	get/put	Backing structure
`HashMap`	none (arbitrary)	O(1) avg	hash table
`LinkedHashMap`	insertion or access order	O(1) avg	hash table + linked list
`TreeMap`	sorted by key	O(log n)	red-black tree

new HashMap<>();        // fastest, no order guarantee
new LinkedHashMap<>();  // predictable iteration; access-order mode powers LRU caches
new TreeMap<>();        // keys come out sorted; needs Comparable or a Comparator

Default to HashMap. Choose LinkedHashMap when you need stable iteration order or an LRU cache (access-order mode plus removeEldestEntry), and TreeMap when you need sorted keys or range queries like firstKey, headMap, and subMap. The same pattern extends to sets: HashSet, LinkedHashSet, and TreeSet are thin wrappers over their matching maps.

Recap

HashMap stores entries in a power-of-two bucket array of Nodes, indexes them with the bit trick (n - 1) & hash after a cheap spread function mixes the high bits down, and keeps chains short by resizing at capacity × 0.75. Collisions chain into linked lists that treeify into red-black trees past 8 entries (with a table of 64+), bounding the worst case at O(log n). The whole thing rests on a sound hashCode/equals contract and immutable keys — break either and lookups silently fail or degrade to O(n). HashMap allows one null key and many null values but is not thread-safe; use ConcurrentHashMap with its per-bin locking and CAS for shared access, and pick LinkedHashMap or TreeMap when you need ordering. Understand these moving parts and you can answer almost any HashMap question from first principles.

More ways to practice