This is how Apache Cassandra works: https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/arc....
Do virtual nodes actually help?
From what I can see in the UI, nodes are placed semi randomly on the ring (probably a hash itself determines the node placement) so don’t you still have the same problem that the hashing of virtual nodes onto the ring can still cause imbalances?
To me it seems like you should be trying to put new nodes on the ring within the largest gaps on the ring.
Add enough random nodes and eventually it will be even, so it helps.
Why not just make the nodes even from the start? Place new nodes in the largest existing gap between any pair of nodes.
Worth mentioning that virtual nodes also ensure that the order of servers is random. Which helps when a server is removed - the keys that need to be moved will be spread across all other servers. If we were to evenly chop up the hash ring, server B will always be after server A. And when we remove server A, all keys residing on it will need to be moved exclusively to server B.
In a distributed system all clients would need to agree on the largest existing gap under various messaging anomalies. If you have a mechanism to reach that agreement then you can probably use a simpler deterministic load balancing algo. Consistent hashing gives you eventually consistent routing without (synchronous) agreement.
Whatever balanced configuration you make, it will become imbalanced if you add or remove a node.
Curious, I just learned about hash rings last week when setting up the ironic openstack service [0].
[0] https://docs.openstack.org/ironic/latest/_modules/ironic/com...
If you're looking for an application of these, the DynamoDB paper is a really great read: https://www.allthingsdistributed.com/files/amazon-dynamo-sos...
Love this.
Interactive Consistent HashRing Visualization