Multi-probe consistent hashing (2015)

#Consistent Hashing

September 6, 2025

Karger et al’s Consistent hashing is widely used for load balancing in content delivery networks and key-value store databases. Consistent hashing maps both keys and nodes to values in the unit interval, and assigns each key to its closest node.

Since nodes in the cluster can frequently change, the algorithm generates multiple hash values for each node to help balance the load. Let \(n\) be the number of nodes. The load of a node is the number of keys stored on it, and the peak-to-average load is defined as the ratio of the maximum load to the average load across all nodes. Achieving a peak-to-average load of \(1+\epsilon\) with high probability (\(1-\frac{1}{n^{\Omega(1)}}\)) requires \(\Theta(\frac{\ln n}{\epsilon^2})\) hash values per node. This form of consistent hashing requires more than \(O(n)\) space to achieve good load balancing.

Multi-probe consistent hashing instead generates \(K\) hash values for each key, rather than multiple hash values for the nodes. A key is then assigned to its closest successor node among the candidate hash positions. For \(2 \le K \ll \frac{\sqrt{n}}{\ln n}\), the peak-to-average load is \(\frac{K}{K-1}\frac{1}{n}+o(\frac{1}{n})\) with high probability (\(1-\frac{1}{n^{\Omega(1)}}\)). While the original algorithm assigns each key to its closest node, Multi-probe consistent hashing assigns the key to its closest successor node.