[2024 dev2next] distributed consensus algorithms

Speakers: Mykyta Protsenko, Alex Borysov

For more see the table of contents

Scenario: need 5 people to meet: two in Ukraine, one east coast US and two West coast US. Need to meet weekly

Consensus properties

Fault tolerance – not reliant on one person to see result
Safety – only one value chosen
Liveness – get to consensus in finite about of time, can take multiple iterations

2 phase commit

One person asks and someone says can attend if everyone can. everyone says yes.
Commit and now official.
Simple but downsides. When group big, a lot of acks. If one no, transaction aborted and start over. Also, waiting for slowest node to reply.
If coordinator loses internet, everything blocked
Fails on fault tolerance and liveness

Paxos Protocol

Everyone has a ballot with unique ballot number
Propose a time with next ballot
Submit last vote null message to promise that will vote. Can be for or against.
Once majority promise to vote, sends actual begin ballot message
Then people actually vote
Consensus is majority. Sends message that reached consensus
Each participant must track last ballot tried, promise to vote and actual vote.
Choosing proposer and ballot don’t have to same person
Can only vote if confirmed a promise to vote in cluster
Learners can observe to be notified when consensus is reached
Fault tolerant because majority is enough
Safety because majority based
Doesn’t ensure liveness.
Can elect member as leader who can be the only one to propose
2 round trips for consensus

Cassandra

Uses Paxos
Need to know order of data – linearizable consistency
Don’t mix transaction types. ex: use if exists/if not exists consistently.
Lightweight transactions are faster than two phase commit
Incurs performance penalty by design because more Paxos interactions.

Raft

Two message types
Leader based. All other nodes are followers
No reelections. Leader stays as leader until disappears
Like Paxos, use increasing numbers
Every node starts as a follower. on term 1 Followers notice no leader. One or more volunteers and increases term number. Others vote on leader. Only one vote for that term so can’t vote twice.
Once leader elected, followers send requests to leader who propagates
Log replication – can be applied (not final) or committed
Use commit index as tracker of what data was committed. Allows to see state
All followers have a heartbeat tracker. If leader disappears, the one who hasn’t heard from the leader in the longest time becomes a candidate to be new leader. If away and request leader, gets rejected because have one
If outside cluster and want to know status, asks leader
Fault tolerance – yes leader or follower can droo
Safety – guarantees one choice. Also only commit data from term
Liveness – in practice yes, but in theory no

Mongo DB

Uses Raft
If slow member is leader, there is a write bottleneck.
Can horizontally scale by replica set. Can hash keys so majority of requests aren’t all on one replica

Accord

New algorithm; not widely available
Leader based protocols create bottleneck
Fast and slow paths
If can get majority with fast path, can tell slower nodes later; even async
A node must be part of all fast paths majority so can share with others when back online
Fast path should be 3/4 of nodes to guarantee someone has latest state
Slow path remains as simple majority
ACID
Reorder buffer to reset transactions to be in order based on time differenitials

My take

I knew what two phase commit was. Everything else was new to me. Excellent start to the morning! The five people voting made it easier to follow. The reasons for them disappearing (Ukrainian soldier, Californian losing power) also helped pay attention. (Left a few minutes early to answer a phone call)

Down Home Country Coding With Scott Selikoff and Jeanne Boyarsky

Java/J2EE Software Development and Technology Discussion Blog

Leave a Reply

Share this:

Leave a Reply