Leader Election and Term Numbers: Preventing Split-Brain
When a swarm of agents needs one voice to speak for all, Arbiter runs secure elections. But network partitions can cause two agents to both believe they're leader—a split-brain scenario that corrupts state.
Arbiter prevents this with term numbers—strictly monotonic counters that make on-chain state authoritative.
The Split-Brain Problem
Consider this scenario:
- Agent A is leader, communicating with most of the swarm
- Network partition separates Agent A from Agent B
- Agent B can't see Agent A's heartbeats
- Agent B triggers an election and wins (it can see the other agents)
- Now both Agent A and Agent B think they're leader
Result: two leaders making conflicting decisions. State corruption. Chaos.
How Arbiter Prevents Split-Brain
Arbiter implements Raft-style leader election with term numbers:
- Any swarm member can trigger an election if the current leader's heartbeat times out
- Agents vote for candidates using EIP-712 signed messages
- First candidate to reach quorum becomes leader
- Leadership is recorded on-chain with a term number
- Leaders must periodically checkpoint on-chain to prove liveness
- Term numbers are strictly monotonic—no agent can claim an old term
Term Numbers Are Authoritative
The term number is critical. It's recorded on-chain via the ArbiterFinality contract. This makes it the source of truth.
If network partitions cause two agents to both believe they're leader:
- Both agents check the on-chain term number
- The agent with the higher term is the legitimate leader
- The agent with the lower term must immediately step down
On-chain state resolves the dispute. No ambiguity.
Example Flow
Term 1: Agent A is leader, term 1 recorded on-chain
Network partition: Agent A separated from rest of swarm
Term 2: Agent B triggers election, wins, term 2 recorded on-chain
Partition heals: Agent A sees term 2 on-chain
Agent A steps down: Term 1 < Term 2, so Agent A is no longer leader
Split-brain prevented. On-chain term number is authoritative.
Periodic Checkpoints
Leaders must periodically checkpoint on-chain to prove liveness:
- If a leader fails to checkpoint, other agents can trigger an election
- This prevents "zombie leaders" that are unresponsive but still think they're leader
- Checkpoints include the current term number, proving the leader is still active
EIP-712 Signed Votes
Agents vote for leaders using EIP-712 typed signatures:
- Votes are cryptographically signed
- Votes can be verified on-chain if disputes arise
- Votes are tied to specific term numbers
- Voting in the wrong term is detected and rejected
This prevents vote manipulation and ensures votes are for the correct election.
Why This Matters
Term numbers enable:
- Split-brain prevention: On-chain state resolves disputes
- Leader liveness: Checkpoints prove leaders are active
- Verifiable elections: Votes are signed and can be verified
- Automatic recovery: Failed leaders are replaced automatically
Without term numbers, split-brain scenarios are inevitable. With them, distributed agent systems can safely elect leaders.
Part of the EchoRift infrastructure series. Learn more about Arbiter.