The problem with component storage
Every ECS framework needs a storage layout that keeps iteration fast and entity changes cheap.
A simple array indexed by entity ID gives fast iteration and wastes memory on empty slots. A hash map keeps memory compact and makes iteration harder on the cache. Sparse sets sit between those two choices.
How a sparse set works
A sparse set is two arrays working together:
- Dense array: packed, contiguous, holds the actual data. Every slot is occupied. This is what you iterate over.
- Sparse array: indexed by entity ID, holds the index into the dense array. Most slots are empty.
struct SparseSet<T> {
dense: Vec<Entity>,
data: Vec<T>,
sparse: Vec<Option<usize>>,
}
impl<T> SparseSet<T> {
fn insert(&mut self, entity: Entity, value: T) {
let dense_idx = self.dense.len();
self.dense.push(entity);
self.data.push(value);
if entity.id() >= self.sparse.len() {
self.sparse.resize(entity.id() + 1, None);
}
self.sparse[entity.id()] = Some(dense_idx);
}
fn remove(&mut self, entity: Entity) -> Option<T> {
let idx = self.sparse[entity.id()]?;
self.sparse[entity.id()] = None;
let last = self.dense.len() - 1;
self.dense.swap(last, idx);
self.data.swap(last, idx);
if idx < last {
self.sparse[self.dense[idx].id()] = Some(idx);
}
self.dense.pop();
Some(self.data.pop().unwrap())
}
fn get(&self, entity: Entity) -> Option<&T> {
let idx = *self.sparse.get(entity.id())?.as_ref()?;
Some(&self.data[idx])
}
fn iter(&self) -> impl Iterator<Item = (Entity, &T)> {
self.dense.iter().copied().zip(self.data.iter())
}
}The useful part is that iter() walks the dense array linearly. Every element is adjacent in memory, which gives the CPU prefetcher a simple pattern to follow.
The swap-remove trick
When you remove an entity, shifting everything down would be O(n). Sparse sets swap the removed element with the last element and pop. This is O(1), with the tradeoff that the dense array no longer stays sorted by entity ID.
| Operation | Time Complexity | Cache Behavior |
|---|---|---|
| Insert | O(1) amortized | Single write to dense end |
| Remove | O(1) | Swap + pop, two cache lines max |
| Lookup | O(1) | One sparse read + one dense read |
| Iterate | O(n) | Sequential, prefetch-friendly |
| Contains | O(1) | Single sparse array read |
Where sparse sets fall short
The sparse array itself can get large. If your entity IDs go up to 1,000,000, you need a sparse array with 1,000,000 slots even if you only have 100 entities with this component. Some implementations use paged sparse arrays to mitigate this, splitting the ID space into pages and only allocating pages that contain entities.
Intersecting two sparse sets requires checking membership in one set while iterating the other. Usually you iterate the smaller set and probe the larger one. The work is still O(n) in the smaller set's size, with random access into the larger sparse array.
Compared to archetypes
Archetype-based ECS (like in Flecs or Bevy's latest versions) groups entities by their component combination. If 10,000 entities all have Position + Velocity + Sprite, they share one archetype table with three tightly packed arrays.
The advantage: iterating over (Position, Velocity) is a single linear scan of two arrays that are always the same length. No membership checks needed.
The disadvantage: adding or removing a component moves the entity to a different archetype table. This involves copying all component data, which gets expensive if it happens frequently.
Sparse sets are comfortable with frequent component changes. Archetypes are strong at bulk iteration across multiple components. Most real ECS frameworks use a hybrid approach.
When this matters
For game engines running at 60 FPS with 50,000+ entities, the difference between cache-friendly iteration and pointer chasing can be 10-50x in throughput. Sparse sets matter most when you are building an ECS framework or working with massive entity counts.
The key insight is that sparse sets are a tradeoff machine: they sacrifice memory (the sparse array) and ordering (swap-remove) to get the combination of O(1) mutations and cache-friendly iteration. Every ECS storage strategy makes different tradeoffs. Knowing what you're trading helps you pick the right one.
