Data Anomalies in Distributed Systems: Split Brain, Clock Skew, Stale Reads, and More
A taxonomy of the anomalies that distributed systems produce beyond SQL transaction isolation β and the engineering patterns that prevent them
On this page
Reader feedback
Was this article useful?
Rate it if it helped, then continue with the next deep dive when you are ready.
Executive TLDR
- TLDR: Distributed systems produce anomalies not because the code is buggy β but because physics makes perfect consistency impossible across network boundaries.
- Split brain, stale reads, clock skew, causality violations, and cascading failures are the canonical failure modes.
- Each has a distinct root cause and a targeted set of prevention patterns.
- This post is the index: it maps every anomaly to its cause and cure, and links to the focused deep dives for the three main families.
Core mental model
Read this as a system of state, constraints, and failure boundaries.
A taxonomy of the anomalies that distributed systems produce beyond SQL transaction isolation β and the engineering patterns that prevent them
Key systems visualization
The articleβs conceptual path
01
π Why Distributing Data Breaks Rules Physics Never Mentioned
02
βοΈ Split Brain: When Two Nodes Both Believe They Are the Leader
03
π§ Stale Reads and Clock Skew: The Two Anomalies That Break Your Sense of Time
04
ποΈ Causality Violations and Vector Clocks: When the Reply Arrives Before the Post
05
π Network Partition Anomalies: What CAP Looks Like in Production
> TLDR: Distributed systems produce anomalies not because the code is buggy β but because physics makes perfect consistency impossible across network boundaries. Split brain, stale reads, clock skew, causality violations, and cascading failures are the canonical failure modes. Each has a distinct root cause and a targeted set of prevention patterns. This post is the index: it maps every anomaly to its cause and cure, and links to the focused deep-dives for the three main families.
A single-node relational database is a marvel of predictability. ACID properties hold because every transaction passes through one scheduler, one memory space, one disk. Distribute that same database across five nodes in two datacenters and the rules change entirely. You now have five independent clocks that drift apart. You have a network that can drop, delay, reorder, and duplicate messages. You have five failure domains that can fail independently β or fail in correlated ways that your monitoring cannot distinguish from healthy operation.
The anomalies covered in this post do not arise from bugs. They arise from the physics of distribution: the impossibility of instant communication, the impossibility of synchronized clocks, and the impossibility of knowing whether a silent node is dead or merely slow.
This post is the index. It provides the taxonomy, the anomaly map, the prevention cheatsheet, and links to three focused deep-dives β one for each major anomaly family covered here.
π Distributed Anomalies vs. SQL Transaction Anomalies
The SQL transaction anomaly posts in this series (dirty read, phantom read, write skew, lost update) cover anomalies that arise from concurrent transactions within a single database. They are solved by choosing the right isolation level. The anomalies in this post are different in origin, detection, and cure.
| Anomaly Type | Where It Arises | Primary Cause | Prevention Domain |
| Dirty read, phantom read, write skew | Single database, concurrent transactions | Insufficient isolation level | SQL isolation levels (READ COMMITTED, SERIALIZABLE) |
| Split brain | Distributed replica set | Network partition β two leaders | Quorum writes, Raft terms, fencing tokens |
| Stale reads | Primaryβreplica replication | Async replication lag | Consistency models (read-your-writes, LSN tokens) |
| Clock skew / causality violations | Any distributed system | Clock drift + LWW conflict resolution | HLC, vector clocks, TrueTime |
| Cascading failures | Microservice dependency chains | Retry amplification after node failure | Circuit breakers, bulkheads, jitter |
Understanding which family an anomaly belongs to immediately narrows the solution space. Visibility anomalies are fixed by routing and consistency tokens. Ordering anomalies are fixed by logical clocks. Availability anomalies are fixed by epoch tracking and fencing. Integrity anomalies are fixed by merge semantics or conflict-free data structures.
π The Core Principle: Consistency and Availability Cannot Coexist Under Partition
The CAP theorem states that a distributed system can guarantee at most two of three properties: Consistency (every read receives the latest write), Availability (every request receives a response), and Partition tolerance (the system operates despite network failures). Since network partitions are unavoidable in production, every distributed system must choose between consistency and availability when a partition occurs.
Systems that prefer consistency (Raft-based clusters, Zookeeper, Spanner) refuse writes when they cannot form a quorum. Systems that prefer availability (Cassandra, DynamoDB, Riak) accept writes from any node and reconcile conflicts later. Every anomaly in this taxonomy traces directly back to where a system sits on this spectrum and which guarantee it relaxes under failure.
βοΈ The Full Anomaly Taxonomy
flowchart TD
A["Anomaly Detected in Production"] --> B{"Classify by family"}
B -->|"Visibility"| C["Stale or torn data returned to reader"]
B -->|"Ordering"| D["Events appear in wrong causal order"]
B -->|"Availability"| E["Node falsely healthy or falsely dead"]
B -->|"Integrity"| F["Two nodes hold conflicting values"]
C --> C1{"Stale read or torn?"}
C1 -->|"Stale"| C2["Bounded staleness / LSN token / Quorum read"]
C1 -->|"Torn"| C3["WAL atomic replay / Commit marker gating"]
D --> D1{"Clock drift or async delivery?"}
D1 -->|"Clock drift"| D2["HLC / TrueTime / Avoid LWW"]
D1 -->|"Async delivery"| D3["Vector clocks / Causal middleware"]
E --> E1{"Split brain or zombie?"}
E1 -->|"Split brain"| E2["Quorum writes / Raft terms / STONITH"]
E1 -->|"Zombie leader"| E3["Fencing tokens / Epoch rejection"]
F --> F1{"Merge or overwrite?"}
F1 -->|"Overwrite acceptable"| F2["LWW with NTP discipline or HLC"]
F1 -->|"Must preserve all"| F3["CRDTs / Application merge"]
The flowchart is the triage tool. When a production anomaly surfaces, classify it into one of the four families first. The classification step narrows the solution space from twelve options to two or three before you consult documentation.
π Anomaly Observable Impact in Production
| Anomaly | Observable Impact | Monitoring Signal | First Response |
| Split brain | Conflicting data; duplicate writes acknowledged | Two primary alerts; write conflict counters | Fence stale leader; inspect replication logs |
| Stale reads | Reads return superseded values with no error | replication_lag; read-after-write failures | Route reads to primary; use quorum reads |
| Clock skew | LWW silently discards newer writes | Clock offset alerts; ntpstat drift > 100 ms | Switch to HLC or server-side timestamps |
| Cascading failure | Error rate climbs across cluster in minutes | p99 latency spike; connection pool exhaustion | Open circuit breakers; enable load shedding |
π§ Deep Dive: How Network Boundaries Break ACID
ACID properties are straightforward in a single-node database where every operation passes through one scheduler, one memory space, and one disk. Distributing the same data across five nodes turns each property into a distributed protocol with new failure modes.
Atomicity across nodes requires two-phase commit, which blocks indefinitely if the coordinator fails mid-transaction. Consistency requires consensus (Raft or Paxos), adding multiple network round-trips per write. Isolation under concurrent distributed transactions requires distributed locking or optimistic concurrency control. Durability requires writes to persist on multiple nodes before acknowledging, adding replication overhead to the critical path.
Each anomaly in this series is a consequence of one of these distributed ACID protocols either failing under partition conditions or being deliberately traded for lower latency and higher availability.
Internals: How Each Anomaly Class Traces Back to a Specific ACID Failure
Each anomaly family maps to a specific distributed ACID breakdown:
- Split brain β Atomicity failure: Two coordinators both accept writes because the cluster cannot atomically agree on primary ownership across a network partition.
- Clock skew β Consistency failure: Last-Write-Wins conflict resolution compares wall-clock timestamps, but clocks on different nodes diverge by 10β500 ms under NTP drift. The causally newer write loses.
- Stale reads β Durability/Isolation failure: Asynchronous replication writes to the primary and returns success before replicas apply the change. Readers on replicas observe pre-write state during the lag window.
- Cascading failures β Atomicity/Availability trade-off: Circuit breakers and bulkheads represent deliberate choices to sacrifice atomicity of a distributed transaction in exchange for isolating failure blast radius.
Performance Analysis: The Latency Cost of Each Anomaly Prevention Mechanism
Every prevention mechanism adds latency. Understanding the cost-to-safety trade-off is required for system design decisions:
| Prevention Mechanism | Latency Added | Throughput Impact |
| Quorum reads (R + W > N) | +1β3 network RTT per read | ~30β50% reduction vs. local read |
| Synchronous replication (2PC) | +1β2 RTT per write commit | Scales linearly with replica count |
| Raft consensus write | +2 RTT (leader + majority ACK) | Leader bottleneck under high write rate |
| Vector clocks | Negligible (metadata size grows) | O(n) metadata per node in cluster |
| Circuit breaker state check | Sub-millisecond (in-memory) | No throughput impact at rest |
| Full-jitter retry backoff | Adds delay under failure | Reduces retry storm by 90%+ vs. naive retry |
π Which Distributed Systems Are Exposed to Which Anomalies
| System | Primary Anomaly Risk | Default Consistency | Recommended Fix |
| PostgreSQL with async replicas | Stale reads | Eventual (async WAL) | synchronous_commit = remote_apply or LSN token routing |
| Cassandra | Clock skew LWW corruption | Eventual (multi-leader, tunable) | Server-side timestamps; HLC |
| MongoDB (w=1) | Split brain on failover | Eventual | Upgrade to w=majority |
| Redis Sentinel (quorum=1) | Split brain in 3-node setup | Eventual | Set quorum to 2 |
| Microservice chains | Cascading failures via retry storm | Application-level | Circuit breakers; bulkheads; jitter |
βοΈ Prevention Strategy Cheatsheet
| Anomaly | Root Cause | Detection Signal | Prevention Pattern | Key Systems |
| Split brain | Network partition isolates leader; peers elect new leader | Two nodes claim leadership for same shard | Quorum writes, fencing tokens, STONITH, Raft term numbers | MongoDB, etcd, Redis Sentinel |
| Stale reads | Async replication lag; reads from lagging replica | pg_stat_replication.write_lag; Seconds_Behind_Master | Read-your-writes (LSN), bounded staleness, quorum reads | PostgreSQL, MySQL, Cassandra |
| Clock skew | NTP imprecision; VM clock pauses | Clock offset metrics; ntpstat; chrony | Hybrid Logical Clocks (HLC), TrueTime (Spanner), avoid wall-clock LWW | Cassandra, Spanner, CockroachDB |
| Causality violation | Async message delivery via different network paths | Out-of-order events; reply seen before post | Vector clocks, causal consistency middleware, total order broadcast | Riak, DynamoDB Streams, Kafka |
| Conflicting writes | Multi-leader or leaderless concurrent writes | Version conflicts on read; anti-entropy divergence | LWW (lossy), application merge, CRDTs | Cassandra, DynamoDB, Riak |
| Zombie leader | GC pause exceeds election timeout; old leader resumes | Multiple leaders in monitoring; duplicate writes | Fencing tokens (monotonic epoch), STONITH, Raft term rejection | ZooKeeper, etcd, Redis Sentinel |
| Cascading failure | Single node failure increases load on survivors β chain reaction | Latency and error rate climbing across all replicas | Circuit breakers, bulkheads, load shedding, jittered backoff | Resilience4j, Istio, Hystrix |
| Thundering herd | Synchronized client retries after common failure event | Retry spike in traffic metrics immediately after outage | Full-jitter exponential backoff, staggered restarts | All HTTP clients, gRPC |
| Cache stampede | Hot cache key expires; all waiters simultaneously hit DB | DB query spike on cache expiry | Probabilistic early expiration, request coalescing (dog-pile lock) | Redis, Memcached, Varnish |
π Deep-Dive Posts: The Three Major Families
The rest of this series covers each anomaly family in its own focused post. Each post provides: detailed root-cause analysis, Mermaid diagrams showing the failure sequence and prevention mechanisms, real-world incident case studies, trade-off tables, and a decision guide for choosing prevention strategies.
π΄ Split Brain β Two Leaders, Conflicting Writes, Permanent Data Loss
When a network partition isolates a replica set's leader from its peers, the peers trigger a new election and elect a new leader. Both the old and new leader are now simultaneously accepting writes to the same dataset. When the partition heals, the old leader's writes are rolled back. Clients that received success acknowledgments find their data gone.
Covered in: Split Brain Explained: When Two Nodes Both Think They Are Leader
Topics: Raft term numbers and quorum math, STONITH out-of-band fencing, zombie leader + GC pause scenario, etcd and Redis Sentinel configuration, MongoDB 2012 two-primary bug, Elasticsearch minimum master nodes misconfiguration.
π‘ Clock Skew and Causality Violations β When Timestamps Lie
Physical clocks on distributed machines drift apart continuously. NTP corrects the drift imprecisely, leaving residual errors of tens to hundreds of milliseconds. Last-Write-Wins conflict resolution using wall-clock timestamps silently discards the correct write when any clock has drifted. Causality violations β a reply arriving before the message that triggered it β can occur even with perfect clocks, purely from asynchronous message delivery.
Covered in: Clock Skew and Causality Violations: Why Distributed Clocks Lie
Topics: NTP drift characteristics, Lamport timestamps, vector clocks (causal ordering + concurrent event detection), Hybrid Logical Clocks (HLC), Google Spanner TrueTime commit-wait, the Cassandra delete-before-write anomaly, CockroachDB HLC configuration.
π Stale Reads and Cascading Failures β Lag Windows and Collapse Chains
Stale reads occur when an asynchronous replica has not yet applied the primary's latest write and a read is routed to it during that window. The read returns the superseded value β with no error and no indication that anything is wrong. Cascading failures are a separate but related pattern: a single node failure redistributes load to survivors, increasing their latency, causing clients to retry, doubling the effective traffic, and triggering further failures in a self-amplifying chain.
Covered in: Stale Reads and Cascading Failures in Distributed Systems
Topics: Read-your-writes, monotonic reads, bounded staleness, quorum reads (R + W > N), WAL-based replication lag mechanics, circuit breaker state machine, bulkhead partitioning, load shedding, thundering herd + full-jitter backoff, cache stampede + probabilistic early expiration.
π§ Decision Guide: Which Anomaly Prevention Applies to Your Use Case
The right prevention strategy depends on what your system cannot afford to get wrong:
- Cannot afford stale data (inventory, payments, seat reservations): Use strong consistency with quorum reads, synchronous replication, or read-your-writes tokens. Accept higher latency on reads and writes.
- Cannot afford split brain (any replicated primary): Use Raft-based consensus with majority quorum. Configure fencing tokens and STONITH. Never use
w=1orquorum=1in sentinel setups. - Cannot afford LWW timestamp corruption (Cassandra, Dynamo-style): Switch to server-side timestamps, HLC, or vector clocks. Never use client-supplied wall-clock timestamps for conflict resolution.
- Cannot afford cascading failures (microservice chains): Add circuit breakers at each service boundary. Use full-jitter exponential backoff. Implement bulkheads to prevent cross-service pool exhaustion.
π§ͺ Practical Anomaly-First Design Review
Before choosing a database or distributed architecture, walk through each anomaly class explicitly:
- Split brain: What is the write quorum size? Can two nodes simultaneously believe they are primary without the cluster detecting it?
- Stale reads: What is the maximum acceptable replication lag? Does the application need read-your-writes consistency for any user-facing operation?
- Clock skew: Are wall-clock timestamps used for conflict resolution anywhere? What is the NTP drift budget across all application servers?
- Cascading failures: Is there a retry strategy defined? Do all downstream dependencies have circuit breakers with appropriate thresholds?
Addressing these four questions before selecting infrastructure eliminates the majority of the operational surprises covered in this series.
π Key Takeaways
- Distributed anomalies arise from physics β network partitions, clock drift, and asynchronous replication are unavoidable properties of distributed systems, not bugs.
- Every anomaly belongs to one of four families: visibility (stale or torn data), ordering (causality violations), availability (split brain, zombie leader), or integrity (conflicting writes).
- Each anomaly has a specific monitoring signal and a targeted prevention pattern. Classifying the anomaly type before searching for a fix narrows the solution space from twelve options to two or three.
- The CAP theorem makes the trade-off explicit: systems optimized for availability under partition are exposed to visibility and integrity anomalies; systems optimized for consistency are exposed to availability anomalies during partitions.
Quiet AI help

Written by
Abstract Algorithms
@abstractalgorithms
Related deep dives
Continue reading



