This week, I am reading the Dynamo paper by Amazon along with its corresponding chapter in the Grokking ASD book. So far, I’ve read up on how Dynamo uses a consistent hashing ring with v-nodes for data partitioning, that we’ll dive deeper into in a minute. Apart from covering the system design principles and technical functionalities, I also add some historical tidbits that surround these technologies.
Data Partitioning
The act of dividing a system’s data into separate pieces (also called partitions, shards, or nodes) so that each piece can be stored, accessed, and managed individually is called partitioning. It allows a system to be more performant, scalable, and available.
The Distributed Hash Table Problem
At its heart, the problem is simple to state but complex to solve:
How do you store and retrieve data in a network of many servers (a "distributed system") as if you were interacting with a single, massive dictionary or hash table?
To do this, any DHT solution must solve three fundamental challenges:
1. The Mapping Problem: Where Does the Data Go?
This is the most basic requirement. The system needs a reliable, repeatable rule to determine exactly which server is responsible for a specific piece of data. If you want to store "user_profile_123"
, the system must have a way to map that key to a specific server (e.g., Server #582). Crucially, every computer in the network must be able to independently calculate this same mapping.
Input: A data key (like a filename or a user ID).
Output: The specific server responsible for storing that key.
Solution: The Naïve Approach
A common first thought is to use a simple math formula. You could take the user ID, hash it to get a number, and then use the modulo (%
) operator.
server_index = hash(user_id) % number_of_servers
If you have 3 servers,
hash(user_id) % 3
will give you 0, 1, or 2. Perfect.
Here's the catastrophic problem: What happens when you need to add a fourth server (Server 4
)? Your formula becomes hash(user_id) % 4
.
Suddenly, the result of the formula changes for almost every single user ID. A user who was on Server 1 might now belong on Server 3. A user from Server 2 might now belong on Server 1. This leads to a massive, system-wide data reshuffle where nearly all your data has to be moved.
2. The Dynamism Problem: How Do You Handle Change?
This is the critical part where naive solutions fail. In any large-scale system, servers are not static.
Nodes Join: New servers are added to increase capacity.
Nodes Leave (or Fail): Servers crash, are taken down for maintenance, or are decommissioned.
A robust solution must handle these changes gracefully without bringing the whole system to a halt. The key goal is minimizing disruption. When one server leaves, you shouldn't have to reshuffle all the data on every other server.
Challenge: How do you re-assign data from a failed server or incorporate a new server while moving the absolute minimum amount of data possible?
3. The Routing/Discovery Problem: How Do You Find the Data?
In a decentralized system with thousands of nodes, a single computer doesn't know about every other computer. So, if you calculate that "user_profile_123"
belongs on Server #582, how do you find the IP address and connect to that specific server?
The DHT needs a protocol for nodes to discover each other and route requests efficiently. A client should be able to connect to any node in the DHT and ask, "Where can I find the data for this key?" That node should then be able to either provide the answer directly or forward the request to another node that is "closer" to the answer.
Challenge: How does a node find the correct destination for a key with only partial knowledge of the entire network?
Imagine you have a website with millions of user profiles, and you want to store them across several servers (say, 3 servers) to handle the load. The simplest question is: "If I have a user's ID, how do I decide which server holds their data?"
Consistent Hashing Solution: The Ring
Instead of a mathematical formula, we use a conceptual ring.
Step 1: Create the Ring Imagine a circle representing all possible hash values (e.g., from 0 to 100 or a much larger number in reality).
Step 2: Place Servers on the Ring You take each server's name or IP address (e.g., "Server 1") and hash it to get its position on the ring. So, Server 1 lands at position 25, Server 2 at 50, Server 3 at 75, and Server 4 at 100.
Step 3: Place Data on the Ring Now, to figure out where a piece of data (like user_id_ABC
) goes, you hash that data key. Let's say hash("user_id_ABC")
results in the number 15. You find position 15 on the ring and then walk clockwise until you find the first server. In this case, you hit Server 1 (at position 25). So, Server 1 is responsible for all data whose hash falls between 1 and 25.
Why This Is So Much Better
Now, let's look at what happens when the number of servers changes.
Scenario 1: A Server is Removed
Imagine Server 2 (range 26-50) fails and is removed from the ring.
What happens to its data? All the data that was mapped to Server 2 (keys that hashed to values 26-50) now needs a new home. According to the rule ("walk clockwise to find the next server"), all of that data will now naturally map to Server 3.
What is the impact? Only Server 3 is affected. It has to take over the data from the failed Server 2. Servers 1 and 4 are completely undisturbed. Their data ranges haven't changed at all. There is no massive, system-wide reshuffle.
Scenario 2: A Server is Added
Imagine we add a new Server 5 that hashes to position 40.
It gets placed on the ring between Server 2 (at 26) and Server 3 (at 51).
What happens to the data? Originally, Server 3 was responsible for the range 51-75. But now, Server 5 is at position 40. According to the "walk clockwise" rule, any key that hashes between 26 and 40 will now belong to Server 5. This data was previously owned by Server 3.
What is the impact? The new Server 5 takes a portion of the data only from Server 3. Again, Servers 1 and 4 are completely unaffected.
In summary, consistent hashing ensures that when a server is added or removed, only its immediate neighbor(s) on the ring are affected, minimizing data movement and maximizing system stability. This is why it's a foundational concept for distributed databases, caches, and content delivery networks. Consistent hashing isn't the entire solution to the DHT problem, but it's a foundational algorithm that solves one of its most difficult parts. Systems like Amazon's Dynamo and Apache Cassandra use consistent hashing as the core mechanism to manage the constant churn of servers in their massive clusters
Alternatives
There are other solutions to the DHT problem. Rendezvous hashing, also known as Highest Random Weight (HRW) hashing, solves a general version of the problem: We are given a set of n sites (servers or proxies, say). How can any set of clients, given an object O, agree on a k-subset of sites to assign to O? The standard version of the problem uses k = 1.
Here's the core idea:
Everyone gets a score: For a given piece of data (like a file), every single server in your cluster computes a numerical "score" or "weight" by hashing the data key combined with its own unique ID.
Highest score wins: The server that produces the highest score is the one that gets to store the data.
That's it. All clients and servers use the same hash function, so they all agree on which server "wins" for any given piece of data.
Its key advantage is that when a server is removed, only the data it was storing is affected. For each of those pieces of data, a new "election" happens among the remaining servers, and the one with the next-highest score takes over. No other data on any other server needs to be moved.
Consistent Hashing with Virtual Nodes
This is a crucial concept that makes consistent hashing practical for real-world, large-scale systems. Let's start with the problems of basic consistent hashing, and then show how virtual nodes solve them perfectly.
The Problems with Basic Consistent Hashing
Uneven Load Distribution: When you place servers on the ring by hashing their names, you might get unlucky. One server could land in a position where the gap to its successor is huge, making it responsible for a massive amount of data, while another server gets a tiny slice. This creates "hotspots."
Single Successor Bottleneck: When a server fails, its entire data load is transferred to one single neighboring server. This can instantly overwhelm the neighbor, potentially causing a cascade of failures.
Doesn't Handle Different Server Powers: A brand-new, powerful server is treated exactly the same as an old, weak one. There's no built-in way to assign more load to the stronger machine.
Solution: Virtual Nodes (Vnodes)
Instead of placing one single token on the ring for each physical server, the vnode strategy says: "Give each physical server many small, virtual tokens and scatter them all over the ring."
Here’s how it works:
A single physical machine, say
Server-A
, doesn't just get one spot on the ring. It gets assigned, for example, 128 virtual nodes (vnodes).These vnodes are named something like
Server-A:1
,Server-A:2
, ...Server-A:128
.Each of these 128 unique vnode names is hashed, creating 128 different points on the ring that all belong to the physical
Server-A
.The same is done for
Server-B
,Server-C
, etc.
The ring is no longer occupied by a few physical servers, but by hundreds or thousands of these virtual nodes.
How Vnodes Solve the Problems
Solves Uneven Load: With hundreds of points scattered across the ring for each server, the law of large numbers smooths everything out. The data keys are now distributed much more evenly and randomly among the vnodes, resulting in a very balanced load on the underlying physical servers. Hotspots are effectively eliminated.
Solves the Successor Bottleneck: This is the most brilliant part. When
Server-A
fails, its 128 vnodes disappear from the ring. But since these vnodes are scattered everywhere, the load for each of them is transferred to a different successor. The responsibility that was held by the singleServer-A
is now distributed across potentially dozens of other physical servers in the cluster. The impact of the failure is diffused, not concentrated.Solves for Different Server Powers: This becomes trivial. Do you have a new server that's twice as powerful as the old ones? Simple: assign it twice as many virtual nodes (e.g., 256 vnodes instead of 128). It will naturally and proportionally be responsible for twice as much data, without any complex logic.
In short, virtual nodes are an abstraction layer on top of consistent hashing that makes it far more balanced, resilient, and flexible for managing real-world clusters of machines.
Summary
Data partitioning is a core concept in large-scale distributed systems, with Amazon's Dynamo providing a key case study. The fundamental challenge is the Distributed Hash Table (DHT) problem, which breaks down into three parts: mapping data to the correct server, handling the constant addition and removal of servers (dynamism), and discovering where data is located across the network.
While a simple modulo hashing method is fragile, the far more robust Consistent Hashing algorithm provides a solution. By using a conceptual "ring", data reshuffling is drastically minimized when the number of servers changes. After briefly mentioning Rendezvous Hashing as an alternative, we dive deep into the main enhancement used by Dynamo: virtual nodes (vnodes). This technique solves the practical weaknesses of basic consistent hashing: uneven load distribution, the "single successor" bottleneck during failures, and the inability to account for servers with different capacities.