Fencing Tokens: Protecting Against Stale Locks

Distributed locks are essential for coordinating access to shared resources. But there's a dangerous edge case that most lock implementations miss: what happens when an agent thinks it has a lock, but the lock has actually expired?

Arbiter solves this with fencing tokens—globally monotonic counters that prevent stale locks from corrupting shared state.

The Stale Lock Problem

Consider this scenario:

Agent A acquires a lock to modify a shared resource
Agent A experiences a long garbage collection pause or network partition
The lock expires while Agent A is paused
Agent B acquires the lock and modifies the resource
Agent A "wakes up" and thinks it still has the lock
Agent A modifies the resource using stale information

Result: corrupted state. Agent A's modifications are based on outdated data, overwriting Agent B's legitimate changes.

This isn't theoretical. It happens in production systems with GC pauses, network partitions, or clock skew. Traditional locks don't protect against it.

How Fencing Tokens Work

Arbiter's distributed locks include a fencing token—a globally monotonic counter that always increases. The process:

Agent requests lock with maximum duration
If lock is free (or expired), agent receives lock + fencing token
Fencing token is a globally monotonic counter—always increases
Before any protected operation, agent must present fencing token
Downstream services reject tokens lower than the highest they've seen

The key insight: even if Agent A thinks it has a lock, if its fencing token is stale (lower than what the resource has seen), the resource rejects the operation.

Example Flow

Time T1: Agent A acquires lock, receives fencing token 42

Time T2: Agent A pauses (GC, network partition)

Time T3: Lock expires, Agent B acquires lock, receives fencing token 43

Time T4: Agent B modifies resource, resource records highest token: 43

Time T5: Agent A "wakes up", tries to modify resource with token 42

Result: Resource rejects token 42 (it's seen 43), Agent A's stale operation fails

On-Chain Fencing Token Registry

Arbiter maintains the fencing token registry on-chain via the ArbiterFinality contract on Base. This ensures:

Global monotonicity: Tokens always increase, never decrease
Verifiable state: Any service can query the current highest token
Tamper-proof: On-chain state can't be manipulated
Consistent view: All services see the same token sequence

When an agent acquires a lock, the fencing token is recorded on-chain. When the agent releases the lock, the token remains in the registry as the highest seen value. This creates an immutable history that prevents stale tokens from being accepted.

Integration Pattern

To use fencing tokens with your resources:

typescript
// Acquire lock with fencing token
const lock = await arbiter.lock.acquire({
  swarmId: 'my-swarm',
  resourceId: 'shared-database',
  maxDuration: 300, // 5 minutes
});

// Store the fencing token
const fencingToken = lock.fencingToken;

// Before modifying resource, validate token
const isValid = await arbiter.lock.validate({
  swarmId: 'my-swarm',
  resourceId: 'shared-database',
  fencingToken: fencingToken,
});

if (!isValid) {
  // Token is stale, don't proceed
  throw new Error('Fencing token invalid');
}

// Modify resource, passing fencing token
await modifyResource({
  fencingToken: fencingToken,
  changes: {...},
});

// Resource checks: if token < highestSeen, reject

Why This Matters

Fencing tokens are essential for:

Financial operations: Preventing double-spending or duplicate transactions
State mutations: Ensuring modifications are based on current state
Resource coordination: Preventing conflicts when multiple agents access shared resources
Audit trails: Providing verifiable proof of lock ownership

Without fencing tokens, distributed locks provide a false sense of security. With them, locks become truly safe for autonomous agent systems.

Part of the EchoRift infrastructure series. Learn more about Arbiter.