Consistent Hashing
A hashing algorithm that distributes cache keys across nodes so that adding or removing a node only remaps a small fraction of keys. Used in CDN load balancing and distributed caching to minimize cache misses during scaling events.
Full Explanation
Regular hash-based distribution (key mod N) breaks badly when N changes. Add one server and nearly every key maps to a different node—your entire cache is suddenly cold. Consistent hashing fixes this by arranging nodes on a virtual ring. Each key hashes to a point on the ring and maps to the next node clockwise. Adding or removing a node only affects the keys in one segment of the ring.
For CDNs, this matters during scaling events. When traffic spikes and you add edge nodes, consistent hashing ensures that most cached content stays on its current node. Only a fraction (roughly 1/N where N is the number of nodes) needs to be remapped. Without it, every scale event would be a cache-busting event.
Virtual nodes (vnodes) improve the algorithm by giving each physical node multiple positions on the ring, creating a more even distribution. Most production systems use 100–200 vnodes per physical node. Implementations like ketama (used in memcached) and Maglev (Google's load balancer) are widely deployed in CDN infrastructure.
Examples
Consistent hashing in practice (Python example):
import hashlib
from bisect import bisect_right
class ConsistentHash:
def __init__(self, nodes, vnodes=150):
self.ring = []
self.node_map = {}
for node in nodes:
for i in range(vnodes):
key = hashlib.md5(f"{node}:{i}".encode()).hexdigest()
self.ring.append(key)
self.node_map[key] = node
self.ring.sort()
def get_node(self, cache_key):
h = hashlib.md5(cache_key.encode()).hexdigest()
idx = bisect_right(self.ring, h) % len(self.ring)
return self.node_map[self.ring[idx]]
# Adding a 4th node only remaps ~25% of keys
ch = ConsistentHash(["edge-1", "edge-2", "edge-3"])
print(ch.get_node("/video/trailer.mp4")) # edge-2
Nginx upstream consistent hashing:
upstream cdn_cache {
hash $request_uri consistent;
server cache-1.internal:8080;
server cache-2.internal:8080;
server cache-3.internal:8080;
# Adding cache-4 only remaps ~25% of requests
}
Video Explanation
Frequently Asked Questions
A hashing algorithm that distributes cache keys across nodes so that adding or removing a node only remaps a small fraction of keys. Used in CDN load balancing and distributed caching to minimize cache misses during scaling events.
Consistent hashing in practice (Python example):
import hashlib
from bisect import bisect_right
class ConsistentHash:
def __init__(self, nodes, vnodes=150):
self.ring = []
self.node_map = {}
for node in nodes:
for i in range(vnodes):
key = hashlib.md5(f"{node}:{i}".encode()).hexdigest()
self.ring.append(key)
self.node_map[key] = node
self.ring.sort()
def get_node(self, cache_key):
h = hashlib.md5(cache_key.encode()).hexdigest()
idx = bisect_right(self.ring, h) % len(self.ring)
return self.node_map[self.ring[idx]]
# Adding a 4th node only remaps ~25% of keys
ch = ConsistentHash(["edge-1", "edge-2", "edge-3"])
print(ch.get_node("/video/trailer.mp4")) # edge-2
Nginx upstream consistent hashing:
upstream cdn_cache {
hash $request_uri consistent;
server cache-1.internal:8080;
server cache-2.internal:8080;
server cache-3.internal:8080;
# Adding cache-4 only remaps ~25% of requests
}