Event Sourcing Pattern: Auditability, Replay, and Evolution of Domain State

Persist domain facts as immutable events and rebuild state predictably under change.

Architecture Patterns for Production Systems

Abstract Algorithms

·Mar 13, 2026·15 min read

📚

Intermediate

For developers with some experience. Builds on fundamentals.

Estimated read time: 15 min

AI-assisted content. This post may have been written or enhanced with AI tools. Please verify critical information independently.

TLDR: Event sourcing pays off when regulatory audit history and replay are first-class requirements — but it demands strict schema evolution, a snapshot strategy, and a framework that owns aggregate lifecycle. Spring Boot + Axon Framework is the fastest production-grade path on the JVM.

📖 Why Storing Events Instead of State Changes Everything

In 2017, a GitLab database administrator ran rm -rf on the wrong production server. They had no event log — just nightly snapshots. Six hours of user data was lost permanently, and thousands of repositories were irrecoverable. Event sourcing would have made full replay possible from any point in that six-hour window. That one architectural choice — append events instead of overwriting state — is the difference between "we can restore to any second" and "we lost six hours and cannot get them back."

Most databases store the current state of a record.A subscription row has a status column. When billing suspends the account, you overwrite ACTIVE with SUSPENDED. Done — but the why, when, and sequence of transitions that led there are gone.

Event sourcing flips the model. Instead of storing the latest snapshot of truth, you store every domain event that caused a state change as an append-only log. Current state is derived on demand by replaying those events in sequence. The log is the audit trail — not a derived artefact built on top of it.

Aspect	Traditional CRUD	Event Sourcing
What is stored	Current row state	Ordered sequence of immutable events
Audit history	Requires separate audit table	Built-in — the event log is the record
Temporal queries	Difficult without CDC or snapshots	Replay the stream to any past position
Concurrent writes	Last-write-wins risk without care	Optimistic concurrency on stream version
Schema evolution	`ALTER TABLE` migrations	Event upcasting at read time

You gain a tamper-evident fact log, time-travel queries, and decoupled read models. You give up simple SELECT * queries and accept the operational cost of snapshot management and schema versioning.

🔍 The Four Building Blocks of an Event-Sourced System

Every production event-sourced system has four roles:

Command — an intent to change state; validated against current aggregate state before writing.
Aggregate — the consistency boundary; enforces invariants, emits events, and advances its internal state machine.
Event Store — the append-only log; events are immutable, each aggregate instance owns a stream by ID.
Projection — a read model rebuilt from the event stream; projections are disposable and always rebuildable.

⚙️ How a Command Flows into an Auditable Event Stream

flowchart TD
    C[Client Command] --> CH[Command Handler (SubscriptionAggregate)]
    CH -->|"validates invariants applies event"| ES[(Event Store Append-Only Log)]
    ES -->|"event published on event bus"| P[BillingHistoryProjection (Event Handler)]
    P --> QM[(Query Model BillingHistoryRepository)]
    QM -->|"query response"| Q[GetBillingHistoryQuery]

    ES -. "token-based replay" .-> RP[Replay Processor (TrackingEventProcessor)]
    RP -. "rebuilds view for audit dispute" .-> QM

    style ES fill:#f5f5f5,stroke:#555
    style QM fill:#e8f4e8,stroke:#555
    style RP fill:#fff3e0,stroke:#f90,stroke-dasharray: 5 5

Solid arrows show the live command path. Dashed arrows show replay — the TrackingEventProcessor resets its token to reconstruct the query model for audit at any historical timestamp.

The aggregate never writes directly to the query model. It emits events; projections consume them independently. A new projection — say, a fraud-detection read model — can be added without touching existing aggregate code.

📊 Event-Sourcing Data Flow Overview

flowchart TD
    CMD[Command] --> AGG[Aggregate validates invariants]
    AGG -->|"emit event"| ES[(Event Store append-only)]
    ES -->|"project"| RM[Read Model]
    ES -. "replay" .-> AUDIT[Audit View]

This diagram shows the two primary data flows in an event-sourced system: the live command path where a Command drives an Aggregate to emit events into the append-only Event Store, which then projects a Read Model; and the dashed replay path where the same Event Store replays historical events to reconstruct an Audit View. The aggregate never writes directly to the read model, keeping write and read concerns fully separated. The key takeaway is that the Event Store is the single source of truth — both the current state and any past state are derivable from it at any time.

📊 Event Write: Command to Store

sequenceDiagram
    participant C as Client
    participant Agg as Aggregate
    participant ES as EventStore
    participant Proj as Projection
    C->>Agg: Send Command
    Agg->>Agg: Validate & decide
    Agg->>ES: Append events
    ES-->>Agg: Events persisted
    ES->>Proj: Notify new events
    Proj->>Proj: Rebuild read model
    Proj-->>C: Updated state

This sequence diagram zooms into the synchronous command path: the Client sends a command, the Aggregate validates it and decides to emit events, the EventStore appends them (enforcing version-based optimistic locking), and the Projection rebuilds the read model before confirming the updated state back to the Client. The notify step from EventStore to Projection can be synchronous or via an event bus depending on consistency requirements. The key takeaway is that the Aggregate never reads from the read model — it only applies commands to its own event history, keeping the write path free of read-side dependencies.

🧠 Deep Dive: Inside the Aggregate: State Machines, Snapshots, and Schema Evolution

Internals: Aggregate State Reconstruction

An aggregate's state exists only in memory during command processing. Before handling a command, the framework loads the aggregate by replaying every past event for that aggregate ID in sequence. Each @EventSourcingHandler method advances internal state — status flags, counters, IDs — until the aggregate is fully current. The command handler then checks invariants against that reconstructed in-memory state.

This is powerful but carries a cost: if a subscription has 5,000 events, loading it means replaying 5,000 events before each command. Snapshots solve this. A snapshot captures the full aggregate state at event N; the next load starts from the snapshot and replays only the delta after N.

Schema Evolution Through Upcasting

Events are immutable, but their schemas change. Old stored events must be upcasted — transformed at read time into the new schema without modifying stored data. Axon's EventUpcasterChain handles this transparently. The rule: always deploy upcasters before deploying new event versions.

Performance Analysis: Replay Cost Drivers

Factor	Impact	Mitigation
Event stream length	Linear aggregate load time	Snapshot every N events
Projection rebuild	Full event store scan	Token-based reset with parallel threads
Upcaster chain depth	CPU overhead at deserialization	Keep upcasters thin; version events early
Projection lag	Stale reads during backfill	Monitor processor lag; dedicate a shadow DB for replay

🛠️ Axon Framework and EventStoreDB: Event Sourcing on the JVM

Axon Framework is a Spring Boot-native Java framework that manages the full event-sourcing lifecycle: aggregate command handling, event persistence, snapshotting, replay, upcasting, and projection tracking. EventStoreDB is a purpose-built append-only database with server-side projections and persistent subscription support — the recommended backend for production Axon deployments requiring audit-grade storage.

These tools solve the event-sourcing problem by owning the infrastructure that makes aggregates deterministic: Axon's @CommandHandler / @EventSourcingHandler pattern enforces the strict separation between command validation and state mutation; the TrackingEventProcessor manages checkpoints and replay; the EventUpcasterChain handles schema evolution transparently. Teams write domain logic; Axon owns the replay machinery.

The complete SubscriptionAggregate, BillingHistoryProjection, snapshot configuration, and replay code are shown in the 🧪 Subscription Billing section below. The minimal starting dependency:

<dependency>
  <groupId>org.axonframework</groupId>
  <artifactId>axon-spring-boot-starter</artifactId>
  <version>4.9.3</version>
</dependency>
<!-- Optional: EventStoreDB connector replaces the default JPA event store -->
<dependency>
  <groupId>org.axonframework.extensions.eventstored</groupId>
  <artifactId>axon-eventstoredb-spring-boot-starter</artifactId>
  <version>0.1.0</version>
</dependency>

Framework	Strengths	Best fit
Axon Framework (Spring Boot)	Spring-native, full ES + CQRS lifecycle, built-in snapshots, replay, and upcasting	Enterprise Spring Boot teams wanting all pieces integrated
EventStoreDB Java client	Purpose-built append-only store, server-side projections, excellent audit semantics	Teams that want a best-in-class store and will wire their own projections
Spring Data + custom event table	Lightweight, no new infrastructure; PostgreSQL append-only event table with Outbox	Simple domains; teams wary of framework lock-in
Lagom (Akka-based)	Reactive, high throughput, persistent entities, cluster sharding	High-concurrency JVM services already on the Akka stack

For a full deep-dive on Axon Framework and EventStoreDB in production, a dedicated follow-up post is planned.

🌍 Real-World Applications

Event sourcing earns its complexity where audit trails and replay are first-class business requirements.

Company / Industry	Driver	Event sourcing advantage
LMAX Exchange — finance	6M+ orders/sec with full regulatory audit	Replay market state to any timestamp for regulators
Shopify — e-commerce	Fraud investigation, inventory disputes	Replay order event stream to exact inventory at purchase time
Healthcare systems	Consent tracking, patient record disputes	Immutable facts with time-travel replay; no separate audit table
Insurance	Claims and policy versioning	Full decision trail; compensation events on reversals

🧪 Subscription Billing: Building the Aggregate, Projection, and Replay

Scenario: A billing platform tracks the lifecycle of each subscription — CREATED → ACTIVATED → SUSPENDED → CANCELLED. Every state transition is an immutable domain event appended to the subscription's event stream. When a customer disputes a charge, the support team replays the event stream to reconstruct exactly what the account looked like at the moment of the disputed transaction.

Maven Dependency

<dependency>
  <groupId>org.axonframework</groupId>
  <artifactId>axon-spring-boot-starter</artifactId>
  <version>4.9.3</version>
</dependency>

Domain Events (Immutable Value Objects)

public record SubscriptionCreatedEvent(
    String subscriptionId, String tenantId, String planId, Instant occurredAt) {}

public record SubscriptionActivatedEvent(
    String subscriptionId, String tenantId, Instant occurredAt) {}

public record SubscriptionSuspendedEvent(
    String subscriptionId, String tenantId, String reason, Instant occurredAt) {}

public record SubscriptionCancelledEvent(
    String subscriptionId, String tenantId, String reason, Instant occurredAt) {}

Each event is a value object with no setters. The aggregate assigns IDs and timestamps at the command-handler boundary — events never generate their own identity.

SubscriptionAggregate

@Aggregate(snapshotTriggerDefinition = "subscriptionSnapshotTrigger")
public class SubscriptionAggregate {

    @AggregateIdentifier
    private String subscriptionId;
    private SubscriptionStatus status;
    private String tenantId;

    protected SubscriptionAggregate() {} // required by Axon for event-sourced replay

    @CommandHandler
    public SubscriptionAggregate(CreateSubscriptionCommand cmd) {
        AggregateLifecycle.apply(new SubscriptionCreatedEvent(
            cmd.subscriptionId(), cmd.tenantId(), cmd.planId(), Instant.now()));
    }

    @EventSourcingHandler
    public void on(SubscriptionCreatedEvent event) {
        this.subscriptionId = event.subscriptionId();
        this.tenantId       = event.tenantId();
        this.status         = SubscriptionStatus.CREATED;
    }

    @CommandHandler
    public void handle(SuspendSubscriptionCommand cmd) {
        if (status != SubscriptionStatus.ACTIVE) {
            throw new IllegalStateException(
                "Only ACTIVE subscriptions can be suspended; current status: " + status);
        }
        AggregateLifecycle.apply(new SubscriptionSuspendedEvent(
            subscriptionId, tenantId, cmd.reason(), Instant.now()));
    }

    @EventSourcingHandler
    public void on(SubscriptionSuspendedEvent event) {
        this.status = SubscriptionStatus.SUSPENDED;
    }
}

@CommandHandler enforces invariants then calls AggregateLifecycle.apply(). @EventSourcingHandler is the only place state is mutated — this strict separation is why replay is always deterministic regardless of how many times it runs.

Snapshot Configuration — Preventing Cold-Start Replay Tax

@Configuration
public class AxonConfig {

    @Bean
    public SnapshotTriggerDefinition subscriptionSnapshotTrigger(Snapshotter snapshotter) {
        // Capture a snapshot after every 50 events.
        // Next load starts from the snapshot and replays only the delta (≤ 49 events).
        return new EventCountSnapshotTriggerDefinition(snapshotter, 50);
    }
}

Without snapshots, a subscription with 500 billing events pays a 500-event replay cost on every command. With a threshold of 50, the worst-case delta is 49 events.

BillingHistoryProjection — Read Model and Audit Query Handler

@Component
@ProcessingGroup("billing-history")
public class BillingHistoryProjection {

    private final BillingHistoryRepository repo;

    public BillingHistoryProjection(BillingHistoryRepository repo) {
        this.repo = repo;
    }

    @EventHandler
    public void on(SubscriptionCreatedEvent event, @Timestamp Instant eventTimestamp) {
        repo.save(new BillingHistoryEntry(
            event.subscriptionId(), event.tenantId(),
            "CREATED", event.planId(), eventTimestamp));
    }

    @EventHandler
    public void on(SubscriptionSuspendedEvent event, @Timestamp Instant eventTimestamp) {
        repo.updateStatus(
            event.subscriptionId(), "SUSPENDED", event.reason(), eventTimestamp);
    }

    @QueryHandler
    public List<BillingHistoryEntry> handle(GetBillingHistoryQuery query) {
        return repo.findBySubscriptionId(query.subscriptionId());
    }
}

Every @EventHandler must be idempotent — replay will call these methods again during incident recovery and projection refactors. Use upsert semantics keyed on the event's sequence number to guarantee safety.

Replaying the Event Stream for Audit Disputes

When a customer disputes a charge and the team needs the account state at a specific past timestamp, reset the projection's tracking token to replay from the event store:

// Reset and replay the billing-history projection from the beginning of the event store
eventProcessingConfig
    .eventProcessorByProcessingGroup("billing-history", TrackingEventProcessor.class)
    .ifPresent(processor -> {
        processor.shutDown();
        processor.resetTokens(); // replays all events in stream order
        processor.start();
    });

To scope the replay to a specific timestamp window, filter inside the @EventHandler by comparing the injected @Timestamp Instant against the dispute window before persisting. The event store is immutable — replay always produces the same result, making it a reliable audit mechanism.

📊 Time-Travel Replay

sequenceDiagram
    participant Q as QueryClient
    participant ES as EventStore
    participant Agg as Aggregate
    Q->>ES: Query events from t0 to t1
    ES-->>Q: Return event slice
    Q->>Agg: Replay events t0t1
    Agg->>Agg: Apply each event
    Agg-->>Q: Past state at t1
    Note over Q,Agg: Audit snapshot restored

This sequence diagram illustrates time-travel replay: a QueryClient requests the event slice from t0 to t1, the EventStore returns exactly those events, and the Aggregate replays them in order to reconstruct the system's state at that historical point. Because the Event Store is append-only and immutable, replaying the same window always produces the same result — making it a reliable foundation for audit investigations and dispute resolution. The key takeaway is that point-in-time state reconstruction is a first-class capability of event sourcing, not a special-case workaround.

⚖️ Trade-offs & Failure Modes in Practice

Failure mode	Symptom	Root cause	First mitigation
Long aggregate streams	High command latency on warm-up	No snapshot strategy	Add `EventCountSnapshotTriggerDefinition`
Incompatible old events	`ClassCastException` during replay after deploy	Schema changed without upcaster	Add `SingleEventUpcaster` before deploying new event version
Projection lag under load	Stale reads; audit disputes on in-flight data	Insufficient processor threads	Increase `TrackingEventProcessor` thread count
Unbounded event store growth	Storage cost; slow tail scans	No retention or archival policy	Archive cold streams; keep hot window in fast storage tier
Non-idempotent projection	Duplicate rows after replay	`@EventHandler` not safe to call twice	Use upsert keyed on aggregate ID + event sequence number

🧭 Decision Guide: When Event Sourcing Earns Its Complexity

Situation	Recommendation
Regulatory audit trail required (finance, healthcare, insurance)	Strong fit — the event log is the compliance record
Temporal queries: "what was the state at time T?"	Strong fit — replay to any past stream position
Simple CRUD with no audit or replay requirements	Avoid — operational overhead is not justified
High write throughput (>10k events/sec per stream)	Use with caution — partition streams; evaluate Axon Server
Team unfamiliar with CQRS and aggregate design	Run EventStorming workshops and model the domain first

🔧 Operator Field Note: Three Production Realities

1. Snapshot monitoring. Track axon_command_bus_handler_latency_seconds per aggregate type. Climbing latency with aggregate age signals snapshots are not firing. Query DomainEventEntry sorted by event count to find outliers.

2. Schema-incompatible old events. A ClassCastException during replay almost always means a missing upcaster. Safe sequence: write a SingleEventUpcaster (V1 → V2), deploy it before the new event version, then deploy the aggregate code. Never modify stored events in place.

3. Isolated projection replay. Each @ProcessingGroup owns its own tracking token. Resetting billing-history leaves all other processors unaffected. Route audit queries to a dedicated shadow query model so live billing traffic is never blocked during replay.

📚 Hard-Won Lessons from Production Event-Sourced Systems

Design events for readers, not writers. Rich, self-describing payloads survive upcasting; terse internal codes do not.
Snapshots are not optional at scale. An aggregate with 1,000 events pays a 1,000-event replay cost on every command without one. Define your threshold before going live.
Idempotent projections are mandatory. Every @EventHandler must be safe to call twice — replay occurs during incident recovery and schema migration.
Schema evolution is the hardest operational problem. Deploy upcasters before new event versions, never after.
Replay is a first-class feature. Use it for analytics backfills, fraud investigation, and projection refactors.

📌 TLDR: Summary & Key Takeaways

Event sourcing stores immutable domain facts rather than mutable state rows; current state is always derivable by replaying the event log in order.
Aggregates are deterministic state machines: @CommandHandler enforces invariants; @EventSourcingHandler mutates state — nowhere else. This separation makes replay reliable.
Snapshots are essential for long-lived aggregates. Without them, command latency grows linearly with stream length.
Projections are disposable read models. Because the event store is the source of truth, any query model can be rebuilt from the log at any time — including for historical audit.
Schema evolution requires upcasters. Deploy the upcaster before the new event version; test replay in staging before promoting to production.
Audit trails, temporal queries, and replay-based dispute resolution are built-in features — not bolt-ons.

Test Your Knowledge

🧠

Ready to test what you just learned?

AI will generate 4 questions based on this article's content.

Stale Reads and Cascading Failures in Distributed Systems

TLDR: Stale reads return superseded data from replicas that haven't yet applied the latest write. Cascading failures turn one overloaded node into a cluster-wide collapse through retry storms and redistributed load. Both are preventable — stale reads...

May 3, 2026•23 min read

Split Brain Explained: When Two Nodes Both Think They Are Leader

TLDR: Split brain happens when a network partition causes two nodes to simultaneously believe they are the leader — each accepting writes the other never sees. Prevent it with quorum consensus (at least ⌊N/2⌋+1 nodes must agree before leadership is g...

May 3, 2026•20 min read

Clock Skew and Causality Violations: Why Distributed Clocks Lie

TLDR: Physical clocks on distributed machines cannot be perfectly synchronized. NTP keeps them within tens to hundreds of milliseconds in normal conditions — but under load, across datacenters, or after a VM pause, the drift can reach seconds. When s...

May 3, 2026•18 min read

NoSQL Partitioning: How Cassandra, DynamoDB, and MongoDB Split Data

TLDR: Every NoSQL database hides a partitioning engine behind a deceptively simple API. Cassandra uses a consistent hashing ring where a Murmur3 hash of your partition key selects a node — virtual nodes (vnodes) make rebalancing smooth. DynamoDB mana...

May 3, 2026•22 min read