Quorums
As you venture into more complicated architecture setups, there are a few things engineers sometimes don't understand when setting up consensus-based systems. Understanding quorums and fault tolerance are critical in understanding such systems.
As you venture into more complicated architecture setups, there are a few things engineers sometimes don't understand when setting up consensus-based systems. Understanding quorums and fault tolerance are critical in understanding such systems.
The problems in these systems are numerous and often include state —which complicates things further. Generally, when designing systems like this, avoiding state management can simplify things and create more robust systems if avoidable.
So let's take the example of a distributed data system where actions it takes are recoverable in the result of a crash.Well, the distributed system allows us to recover from single failures, but do we need all servers to have a copy to ensure that data is available? This is undoubtedly wasteful. It will also make the entire throughput of the system slower since it will have to acknowledge each action taken. Overall, system latency will come into play as cluster sizes increase.
What do we do if we can't reach other nodes in the cluster? How many failures can we tolerate?