Raft and Paxos: Distributed Consensus Algorithms

📆 · ⏳ 5 min read · ·

Introduction

Greetings, fellow tech enthusiasts! Today, we’re setting out on an exciting journey into the world of distributed systems, where we’ll uncover the magic behind two prominent consensus algorithms: Raft and Paxos.

In distributed systems, achieving consensus among multiple nodes is essential for data replication, fault tolerance, and system reliability.

Both Raft and Paxos are designed to tackle this challenging task, but they do so in distinct ways. Let’s learn how - and why - these algorithms work!

Understanding Consensus Algorithms

Consensus algorithms play a pivotal role in distributed systems, ensuring that all nodes in a cluster agree on a single, consistent state despite failures and network partitions.

By establishing a shared understanding of the system’s state, consensus algorithms enable fault tolerance and data replication, making distributed systems resilient and reliable.

Introducing Paxos

Paxos ↗️, one of the pioneering consensus algorithms, was introduced by Leslie Lamport in the late 1980s.

It relies on a leader-based approach, where a single node acts as the leader to propose and coordinate changes to the system’s state.

At its heart, Paxos operates based on a few fundamental concepts

  • Proposal: In Paxos, a node initiates a proposal, which contains a value or operation that it wants to reach consensus on.

    A proposal is a unique identifier, and nodes use it to order proposals.

  • Acceptor: Acceptors are nodes that receive and store proposals. They play a crucial role in deciding which proposals get accepted and in what order.

  • Learner: Learners are nodes that gather the accepted proposals from acceptors and ensure that consensus is achieved.

    They provide the final output to the system.

Here’s a simplified walkthrough of how Paxos achieves consensus:

Phase 1: Prepare (or Prepare & Promise)

In this phase, the node acting as a proposer sends a prepare message to a quorum of acceptors. The prepare message includes the proposal number.

Acceptors check if they’ve seen a proposal with a higher number. If not, they promise not to accept any proposal with a lower number and respond with a promise.

Phase 2: Accept (or Accept & Accepted)

If a proposer receives promises from a quorum of acceptors, it can send an accept message with its proposal to those acceptors.

Upon receiving an accept message, acceptors check if they have promised not to accept any proposal with a lower number. If not, they accept the proposal and send an accepted message.

Phase 3: Learn

Learner nodes collect the accepted proposals from acceptors. Once they observe that a quorum of acceptors has accepted the same proposal, consensus is achieved.

The learners provide the agreed-upon value or operation as the final output.


While this is a simplified overview of Paxos, implementing it correctly and efficiently in a real-world distributed system can be challenging.

Over the years, several variants ↗️ and optimizations of Paxos have emerged to address various practical issues.

The Raft Consensus Algorithm

Raft ↗️, developed by Diego Ongaro and John Ousterhout, offers a more intuitive and understandable approach to consensus.

It features a leader election process and emphasizes simplicity, making it easier to comprehend and implement compared to Paxos.

Raft’s design revolves around several key components:

  • Leader: Raft operates in a leader-follower model. One node is elected as the leader, responsible for managing the replication of log entries across all nodes.

  • Follower: Nodes in the system that are not leaders are followers. They replicate log entries sent by the leader.

    Relate this to the master and slave nodes in a master-slave database architecture.

  • Candidate: When there is a need to elect a new leader (e.g., if the current leader fails), followers transition to candidate status and participate in leader election.

  • Log: Each node maintains a log, which stores a sequence of commands or entries. Log entries represent changes to the system’s state.

Raft achieves consensus through a series of phases and roles. Here’s a step-by-step overview of how Raft works:

Leader Election

Raft begins with leader election. Initially, all nodes are in follower state.

If a follower does not receive a heartbeat message from the leader for a specified time (timeout), it transitions to candidate status and requests votes from other nodes.

The first candidate to collect votes from the majority becomes the new leader.

Log Replication

Once a leader is established, it accepts client requests and appends them as log entries. These log entries are then replicated to all followers.

A log entry is considered committed once it’s stored on a majority of nodes. Followers acknowledge each log entry’s receipt.

Safety and Consistency

Raft ensures safety and consistency. A leader will only accept a log entry if it has not already been included in its log.

This prevents inconsistencies from emerging in the system.

Handling Failures

Raft handles failures gracefully. If the leader crashes or becomes unreachable, followers detect the absence of heartbeat messages and initiate a new leader election.

This ensures that a new leader quickly takes charge in the event of leader failure.


While Raft simplifies many aspects of distributed consensus, it’s not without its complexities.

Implementing Raft in a real-world system involves addressing various details and optimizations.

One good example would be how MongoDB implements Raft in its replica set ↗️.

Conclusion

Distributed consensus algorithms are a fascinating topic, and we’ve only scratched the surface here.

If you’re interested in learning more, would highly recommend checking out the Raft website ↗️ and Paxos website ↗️.

I hope you enjoyed this exploration of Raft and Paxos. Until next time, happy coding!

You may also like

  • # system design

    Building a Read-Heavy System: Key Considerations for Success

    In this article, we will discuss the key considerations for building a read-heavy system and how to ensure its success.

  • # system design

    Building a Write-Heavy System: Key Considerations for Success

    In this article, we'll discuss crucial considerations that can guide you towards success in building a write-heavy system and help you navigate the complexities of managing high volumes of write operations.

  • # system design

    Tackling Thundering Herd Problem effectively

    In this article, we will discuss what is the thundering herd problem and how you can tackle it effectively when designing a system.