Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web

#Consistent Hashing

March 18, 2025

Consistent hashing is used for load balancing in cache systems at the scale of a Content Delivery Network (CDN), as discussed. The paper Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web proposed the hashing algorithm to decrease or eliminate the occurrence of hot spots in the network.

A Random Tree is a d-ary tree where consistent hashing maps each node to cache servers. Assuming the root node’s cache servers have all the pages, parent cache servers provide copies to their child nodes. When a request for a page comes, a client chooses a leaf at random. A leaf cache server that receives the request returns the page if the server stores it. If the server does not store it, it recursively queries to its parent cache servers for the page. When a cache miss occurs, the cache server increments a counter for the requested page. If the counter exceeds a threshold, the cache server stores the page received from a parent cache server in addition to passing it to a child cache server.

Changes in the set of active cache servers are inevitable. Additionally, clients and cache servers cannot communicate with all other cache servers. To maintain the properties of the Random Tree and avoid unnecessary remapping, mapping definitions that minimize remappings due to chages in active servers are needed.

Consistent hashing in the Random Tree equally distributes cache servers among nodes and reduces the number of cache servers that need to be moved in response to changes in the set of active servers. Consistent hashing uses a hash function to assign real numbers within the unit interval ([0, 1]) to cache servers and nodes. It maps a node to the cache server whose value is nearest to that of the node’s. When a cache server joins, nodes from neighboring servers move to it if their values are closer to the new server than to their current server.

When a cache server leaves, nodes with real values greater than that of the removed server move to the successor, while those with smaller values move to the predecessor. If the server with largest real value leaves, the server with the smallest value becomes the successor.

More precisely, the cache servers have some real numbers. When a server joins or leaves, only nodes on the successor and predecessor are moved. By assigning multiple real values to a cache server, consistent hashing can more evenly distribute the nodes from the leaving server to other servers, and it moves nodes to a new server from them. A request consists of a client, a page, a path from a leaf to the root in the Random Tree, and the sequence of cache servers along the path. The client determines the cache servers by applying consistent hashing to the path.

The paper describes an implementation of consistent hashing using a binary tree. Other implementations of consistent hashing, such as A Fast, Minimal Memory, Consistent Hash Algorithm and Multi-probe Consistent Hashing, are also available.