Hash Table Internals: Open Addressing and Swiss Tables

probe behavior

Table Pressure

load factor0.80

localityhigh

collision costmedium

resize spikeO(n)

slot selection

Probe Path

1hash key

Turn the key into a bucket candidate and a compact fingerprint.

2probe group

Check nearby slots first so the walk stays inside the same hot region.

3compare keys

Only compare full keys once metadata says a slot is plausible.

4resize

Rebuild into a larger table when probe chains start stretching out.

The basic idea

A hash table maps keys to values using a . The hash function converts the key into an array index. If two keys hash to the same index (a ), you need a strategy to handle it.

The two main collision strategies are chaining, where each slot holds a linked list of entries, and , where the table searches for another slot in the same array. Modern high-performance hash tables tend to use open addressing because the probe sequence stays close in memory.

Open addressing with

The simplest open addressing scheme. When you get a collision, check the next slot. Then the next. Keep going until you find an empty slot.

struct HashMap<K, V> {
    buckets: Vec<Option<(K, V)>>,
    len: usize,
}

impl<K: Hash + Eq, V> HashMap<K, V> {
    fn insert(&mut self, key: K, value: V) {
        let hash = self.hash(&key);
        let mut idx = hash % self.buckets.len();

        loop {
            match &self.buckets[idx] {
                None => {
                    self.buckets[idx] = Some((key, value));
                    self.len += 1;
                    return;
                }
                Some((k, _)) if k == &key => {
                    self.buckets[idx] = Some((key, value));
                    return;
                }
                _ => {
                    idx = (idx + 1) % self.buckets.len();
                }
            }
        }
    }
}

Linear probing has a useful property: the probe sequence is sequential in memory. When you check slot N and it is occupied, slot N+1 is likely already in the same . Chaining usually turns the same lookup into pointer chasing.

The downside: clustering. Once a run of occupied slots forms, it attracts more entries (they all probe into the same cluster). This makes lookup times degrade as the table fills up.

slot selection

Probe Path

1hash key

Turn the key into a bucket candidate and a compact fingerprint.

2probe group

Check nearby slots first so the walk stays inside the same hot region.

3compare keys

Only compare full keys once metadata says a slot is plausible.

4resize

Rebuild into a larger table when probe chains start stretching out.

The load factor is entries / capacity. As it increases, collisions become more frequent and performance degrades.

Load factor	Avg probes (linear)	Avg probes (Robin Hood)
50%	1.5	1.5
70%	2.2	1.8
80%	3.0	2.0
90%	5.5	2.6
95%	10.5	3.2

Most hash tables resize when the load factor exceeds 70-80%. Resizing allocates a larger array and re-inserts every entry. The resize itself is O(n). across the insertions that led to it, the average insertion cost remains O(1).

probe behavior

Table Pressure

load factor0.80

localityhigh

collision costmedium

resize spikeO(n)

Robin Hood hashing

Robin Hood hashing is a refinement of linear probing. During insertion, if the new entry has traveled farther from its ideal slot than the entry currently occupying the slot, they swap places. The goal is to keep probe distances more even.

fn insert(&mut self, key: K, value: V) {
    let hash = self.hash(&key);
    let mut idx = hash % self.capacity;
    let mut entry = (key, value);
    let mut dist = 0;

    loop {
        match &self.buckets[idx] {
            None => {
                self.buckets[idx] = Some(entry);
                return;
            }
            Some(existing) => {
                let existing_dist = self.probe_distance(idx, existing);
                if dist > existing_dist {
                    let displaced = self.buckets[idx].take().unwrap();
                    self.buckets[idx] = Some(entry);
                    entry = displaced;
                    dist = existing_dist;
                }
            }
        }
        idx = (idx + 1) % self.capacity;
        dist += 1;
    }
}

This keeps the maximum probe distance low. In a standard linear probing table, one unlucky entry might be 50 slots away from its ideal position. With Robin Hood hashing, long-distance entries move closer and the overall variance drops.

Rust's standard HashMap used Robin Hood hashing until it switched to a Swiss Table implementation in 2019.

Swiss Tables (hashbrown)

Google's Swiss Table design, used in absl::flat_hash_map and Rust's hashbrown/std::HashMap, groups slots into batches of 16 and uses instructions to check the group at once.

Each group has a 16-byte metadata array where each byte stores the top 7 bits of the hash (called H2) for the entry in that slot, or a special marker for empty/deleted slots.

// Simplified Swiss Table probe
Group group = load_group(ctrl + bucket);
BitMask match = group.match(h2);  // SIMD comparison

for (int i : match) {
    if (keys[bucket + i] == key) {
        return &values[bucket + i];
    }
}

if (group.has_empty()) {
    return nullptr;  // key not in table
}
// continue to next group

The group.match(h2) operation uses SSE2 _mm_cmpeq_epi8 to compare the H2 byte against all 16 control bytes in a single instruction. This means checking 16 slots costs about the same as checking one slot in a traditional hash table.

Why this matters in practice

For small hash tables (under 100 entries), the choice of hash table implementation rarely matters. The total memory fits in L1 cache regardless, and any reasonable implementation will be fast.

For large tables (millions of entries) or high-throughput scenarios (millions of lookups per second), the implementation matters a lot:

Cache-friendly layout (open addressing) beats pointer-chasing (chaining) by 2-5x
SIMD-accelerated probing (Swiss Tables) beats scalar probing by 1.5-3x on modern CPUs
A good hash function (wyhash, ahash) can be 5x faster than a mediocre one (SipHash for non-adversarial inputs)

Rust's HashMap already gives you a Swiss Table with a solid default hash function. In C++, absl::flat_hash_map is often a better starting point than std::unordered_map.

Table Pressure

Probe Path

The basic idea

Open addressing with linear probing

Probe Path

Load factor

Table Pressure

Robin Hood hashing

Swiss Tables (hashbrown)

Why this matters in practice

Open addressing with