All Posts

Event Sourcing Pattern: Auditability, Replay, and Evolution of Domain State

Persist domain facts as immutable events and rebuild state predictably under change.

Abstract AlgorithmsAbstract Algorithms
ยทยท14 min read
Share
Share on X / Twitter
Share on LinkedIn
Copy link

TLDR: Event sourcing pays off when regulatory audit history and replay are first-class requirements โ€” but it demands strict schema evolution, a snapshot strategy, and a framework that owns aggregate lifecycle. Spring Boot + Axon Framework is the fastest production-grade path on the JVM.

๐Ÿ“– Why Storing Events Instead of State Changes Everything

In 2017, a GitLab database administrator ran rm -rf on the wrong production server. They had no event log โ€” just nightly snapshots. Six hours of user data was lost permanently, and thousands of repositories were irrecoverable. Event sourcing would have made full replay possible from any point in that six-hour window. That one architectural choice โ€” append events instead of overwriting state โ€” is the difference between "we can restore to any second" and "we lost six hours and cannot get them back."

Most databases store the current state of a record.A subscription row has a status column. When billing suspends the account, you overwrite ACTIVE with SUSPENDED. Done โ€” but the why, when, and sequence of transitions that led there are gone.

Event sourcing flips the model. Instead of storing the latest snapshot of truth, you store every domain event that caused a state change as an append-only log. Current state is derived on demand by replaying those events in sequence. The log is the audit trail โ€” not a derived artefact built on top of it.

AspectTraditional CRUDEvent Sourcing
What is storedCurrent row stateOrdered sequence of immutable events
Audit historyRequires separate audit tableBuilt-in โ€” the event log is the record
Temporal queriesDifficult without CDC or snapshotsReplay the stream to any past position
Concurrent writesLast-write-wins risk without careOptimistic concurrency on stream version
Schema evolutionALTER TABLE migrationsEvent upcasting at read time

You gain a tamper-evident fact log, time-travel queries, and decoupled read models. You give up simple SELECT * queries and accept the operational cost of snapshot management and schema versioning.

๐Ÿ” The Four Building Blocks of an Event-Sourced System

Every production event-sourced system has four roles:

  1. Command โ€” an intent to change state; validated against current aggregate state before writing.
  2. Aggregate โ€” the consistency boundary; enforces invariants, emits events, and advances its internal state machine.
  3. Event Store โ€” the append-only log; events are immutable, each aggregate instance owns a stream by ID.
  4. Projection โ€” a read model rebuilt from the event stream; projections are disposable and always rebuildable.

โš™๏ธ How a Command Flows into an Auditable Event Stream

flowchart TD
    C[Client Command] --> CH["Command Handler\n(SubscriptionAggregate)"]
    CH -->|"validates invariants\napplies event"| ES[("Event Store\nAppend-Only Log")]
    ES -->|"event published\non event bus"| P["BillingHistoryProjection\n(Event Handler)"]
    P --> QM[("Query Model\nBillingHistoryRepository")]
    QM -->|"query response"| Q[GetBillingHistoryQuery]

    ES -. "token-based replay" .-> RP["Replay Processor\n(TrackingEventProcessor)"]
    RP -. "rebuilds view\nfor audit dispute" .-> QM

    style ES fill:#f5f5f5,stroke:#555
    style QM fill:#e8f4e8,stroke:#555
    style RP fill:#fff3e0,stroke:#f90,stroke-dasharray: 5 5

Solid arrows show the live command path. Dashed arrows show replay โ€” the TrackingEventProcessor resets its token to reconstruct the query model for audit at any historical timestamp.

The aggregate never writes directly to the query model. It emits events; projections consume them independently. A new projection โ€” say, a fraud-detection read model โ€” can be added without touching existing aggregate code.

๐Ÿ“Š Event-Sourcing Data Flow Overview

flowchart TD
    CMD[Command] --> AGG["Aggregate\nvalidates invariants"]
    AGG -->|"emit event"| ES[("Event Store\nappend-only")]
    ES -->|"project"| RM[Read Model]
    ES -. "replay" .-> AUDIT[Audit View]

๐Ÿง  Deep Dive: Inside the Aggregate: State Machines, Snapshots, and Schema Evolution

Internals: Aggregate State Reconstruction

An aggregate's state exists only in memory during command processing. Before handling a command, the framework loads the aggregate by replaying every past event for that aggregate ID in sequence. Each @EventSourcingHandler method advances internal state โ€” status flags, counters, IDs โ€” until the aggregate is fully current. The command handler then checks invariants against that reconstructed in-memory state.

This is powerful but carries a cost: if a subscription has 5,000 events, loading it means replaying 5,000 events before each command. Snapshots solve this. A snapshot captures the full aggregate state at event N; the next load starts from the snapshot and replays only the delta after N.

Schema Evolution Through Upcasting

Events are immutable, but their schemas change. Old stored events must be upcasted โ€” transformed at read time into the new schema without modifying stored data. Axon's EventUpcasterChain handles this transparently. The rule: always deploy upcasters before deploying new event versions.

Performance Analysis: Replay Cost Drivers

FactorImpactMitigation
Event stream lengthLinear aggregate load timeSnapshot every N events
Projection rebuildFull event store scanToken-based reset with parallel threads
Upcaster chain depthCPU overhead at deserializationKeep upcasters thin; version events early
Projection lagStale reads during backfillMonitor processor lag; dedicate a shadow DB for replay

๐Ÿ› ๏ธ Axon Framework and EventStoreDB: Event Sourcing on the JVM

Axon Framework is a Spring Boot-native Java framework that manages the full event-sourcing lifecycle: aggregate command handling, event persistence, snapshotting, replay, upcasting, and projection tracking. EventStoreDB is a purpose-built append-only database with server-side projections and persistent subscription support โ€” the recommended backend for production Axon deployments requiring audit-grade storage.

These tools solve the event-sourcing problem by owning the infrastructure that makes aggregates deterministic: Axon's @CommandHandler / @EventSourcingHandler pattern enforces the strict separation between command validation and state mutation; the TrackingEventProcessor manages checkpoints and replay; the EventUpcasterChain handles schema evolution transparently. Teams write domain logic; Axon owns the replay machinery.

The complete SubscriptionAggregate, BillingHistoryProjection, snapshot configuration, and replay code are shown in the ๐Ÿงช Subscription Billing section below. The minimal starting dependency:

<dependency>
  <groupId>org.axonframework</groupId>
  <artifactId>axon-spring-boot-starter</artifactId>
  <version>4.9.3</version>
</dependency>
<!-- Optional: EventStoreDB connector replaces the default JPA event store -->
<dependency>
  <groupId>org.axonframework.extensions.eventstored</groupId>
  <artifactId>axon-eventstoredb-spring-boot-starter</artifactId>
  <version>0.1.0</version>
</dependency>
FrameworkStrengthsBest fit
Axon Framework (Spring Boot)Spring-native, full ES + CQRS lifecycle, built-in snapshots, replay, and upcastingEnterprise Spring Boot teams wanting all pieces integrated
EventStoreDB Java clientPurpose-built append-only store, server-side projections, excellent audit semanticsTeams that want a best-in-class store and will wire their own projections
Spring Data + custom event tableLightweight, no new infrastructure; PostgreSQL append-only event table with OutboxSimple domains; teams wary of framework lock-in
Lagom (Akka-based)Reactive, high throughput, persistent entities, cluster shardingHigh-concurrency JVM services already on the Akka stack

For a full deep-dive on Axon Framework and EventStoreDB in production, a dedicated follow-up post is planned.

๐ŸŒ Real-World Applications

Event sourcing earns its complexity where audit trails and replay are first-class business requirements.

Company / IndustryDriverEvent sourcing advantage
LMAX Exchange โ€” finance6M+ orders/sec with full regulatory auditReplay market state to any timestamp for regulators
Shopify โ€” e-commerceFraud investigation, inventory disputesReplay order event stream to exact inventory at purchase time
Healthcare systemsConsent tracking, patient record disputesImmutable facts with time-travel replay; no separate audit table
InsuranceClaims and policy versioningFull decision trail; compensation events on reversals

๐Ÿงช Subscription Billing: Building the Aggregate, Projection, and Replay

Scenario: A billing platform tracks the lifecycle of each subscription โ€” CREATED โ†’ ACTIVATED โ†’ SUSPENDED โ†’ CANCELLED. Every state transition is an immutable domain event appended to the subscription's event stream. When a customer disputes a charge, the support team replays the event stream to reconstruct exactly what the account looked like at the moment of the disputed transaction.

Maven Dependency

<dependency>
  <groupId>org.axonframework</groupId>
  <artifactId>axon-spring-boot-starter</artifactId>
  <version>4.9.3</version>
</dependency>

Domain Events (Immutable Value Objects)

public record SubscriptionCreatedEvent(
    String subscriptionId, String tenantId, String planId, Instant occurredAt) {}

public record SubscriptionActivatedEvent(
    String subscriptionId, String tenantId, Instant occurredAt) {}

public record SubscriptionSuspendedEvent(
    String subscriptionId, String tenantId, String reason, Instant occurredAt) {}

public record SubscriptionCancelledEvent(
    String subscriptionId, String tenantId, String reason, Instant occurredAt) {}

Each event is a value object with no setters. The aggregate assigns IDs and timestamps at the command-handler boundary โ€” events never generate their own identity.

SubscriptionAggregate

@Aggregate(snapshotTriggerDefinition = "subscriptionSnapshotTrigger")
public class SubscriptionAggregate {

    @AggregateIdentifier
    private String subscriptionId;
    private SubscriptionStatus status;
    private String tenantId;

    protected SubscriptionAggregate() {} // required by Axon for event-sourced replay

    @CommandHandler
    public SubscriptionAggregate(CreateSubscriptionCommand cmd) {
        AggregateLifecycle.apply(new SubscriptionCreatedEvent(
            cmd.subscriptionId(), cmd.tenantId(), cmd.planId(), Instant.now()));
    }

    @EventSourcingHandler
    public void on(SubscriptionCreatedEvent event) {
        this.subscriptionId = event.subscriptionId();
        this.tenantId       = event.tenantId();
        this.status         = SubscriptionStatus.CREATED;
    }

    @CommandHandler
    public void handle(SuspendSubscriptionCommand cmd) {
        if (status != SubscriptionStatus.ACTIVE) {
            throw new IllegalStateException(
                "Only ACTIVE subscriptions can be suspended; current status: " + status);
        }
        AggregateLifecycle.apply(new SubscriptionSuspendedEvent(
            subscriptionId, tenantId, cmd.reason(), Instant.now()));
    }

    @EventSourcingHandler
    public void on(SubscriptionSuspendedEvent event) {
        this.status = SubscriptionStatus.SUSPENDED;
    }
}

@CommandHandler enforces invariants then calls AggregateLifecycle.apply(). @EventSourcingHandler is the only place state is mutated โ€” this strict separation is why replay is always deterministic regardless of how many times it runs.

Snapshot Configuration โ€” Preventing Cold-Start Replay Tax

@Configuration
public class AxonConfig {

    @Bean
    public SnapshotTriggerDefinition subscriptionSnapshotTrigger(Snapshotter snapshotter) {
        // Capture a snapshot after every 50 events.
        // Next load starts from the snapshot and replays only the delta (โ‰ค 49 events).
        return new EventCountSnapshotTriggerDefinition(snapshotter, 50);
    }
}

Without snapshots, a subscription with 500 billing events pays a 500-event replay cost on every command. With a threshold of 50, the worst-case delta is 49 events.

BillingHistoryProjection โ€” Read Model and Audit Query Handler

@Component
@ProcessingGroup("billing-history")
public class BillingHistoryProjection {

    private final BillingHistoryRepository repo;

    public BillingHistoryProjection(BillingHistoryRepository repo) {
        this.repo = repo;
    }

    @EventHandler
    public void on(SubscriptionCreatedEvent event, @Timestamp Instant eventTimestamp) {
        repo.save(new BillingHistoryEntry(
            event.subscriptionId(), event.tenantId(),
            "CREATED", event.planId(), eventTimestamp));
    }

    @EventHandler
    public void on(SubscriptionSuspendedEvent event, @Timestamp Instant eventTimestamp) {
        repo.updateStatus(
            event.subscriptionId(), "SUSPENDED", event.reason(), eventTimestamp);
    }

    @QueryHandler
    public List<BillingHistoryEntry> handle(GetBillingHistoryQuery query) {
        return repo.findBySubscriptionId(query.subscriptionId());
    }
}

Every @EventHandler must be idempotent โ€” replay will call these methods again during incident recovery and projection refactors. Use upsert semantics keyed on the event's sequence number to guarantee safety.

Replaying the Event Stream for Audit Disputes

When a customer disputes a charge and the team needs the account state at a specific past timestamp, reset the projection's tracking token to replay from the event store:

// Reset and replay the billing-history projection from the beginning of the event store
eventProcessingConfig
    .eventProcessorByProcessingGroup("billing-history", TrackingEventProcessor.class)
    .ifPresent(processor -> {
        processor.shutDown();
        processor.resetTokens(); // replays all events in stream order
        processor.start();
    });

To scope the replay to a specific timestamp window, filter inside the @EventHandler by comparing the injected @Timestamp Instant against the dispute window before persisting. The event store is immutable โ€” replay always produces the same result, making it a reliable audit mechanism.

โš–๏ธ Trade-offs & Failure Modes in Practice

Failure modeSymptomRoot causeFirst mitigation
Long aggregate streamsHigh command latency on warm-upNo snapshot strategyAdd EventCountSnapshotTriggerDefinition
Incompatible old eventsClassCastException during replay after deploySchema changed without upcasterAdd SingleEventUpcaster before deploying new event version
Projection lag under loadStale reads; audit disputes on in-flight dataInsufficient processor threadsIncrease TrackingEventProcessor thread count
Unbounded event store growthStorage cost; slow tail scansNo retention or archival policyArchive cold streams; keep hot window in fast storage tier
Non-idempotent projectionDuplicate rows after replay@EventHandler not safe to call twiceUse upsert keyed on aggregate ID + event sequence number

๐Ÿงญ Decision Guide: When Event Sourcing Earns Its Complexity

SituationRecommendation
Regulatory audit trail required (finance, healthcare, insurance)Strong fit โ€” the event log is the compliance record
Temporal queries: "what was the state at time T?"Strong fit โ€” replay to any past stream position
Simple CRUD with no audit or replay requirementsAvoid โ€” operational overhead is not justified
High write throughput (>10k events/sec per stream)Use with caution โ€” partition streams; evaluate Axon Server
Team unfamiliar with CQRS and aggregate designRun EventStorming workshops and model the domain first

๐Ÿ”ง Operator Field Note: Three Production Realities

1. Snapshot monitoring. Track axon_command_bus_handler_latency_seconds per aggregate type. Climbing latency with aggregate age signals snapshots are not firing. Query DomainEventEntry sorted by event count to find outliers.

2. Schema-incompatible old events. A ClassCastException during replay almost always means a missing upcaster. Safe sequence: write a SingleEventUpcaster (V1 โ†’ V2), deploy it before the new event version, then deploy the aggregate code. Never modify stored events in place.

3. Isolated projection replay. Each @ProcessingGroup owns its own tracking token. Resetting billing-history leaves all other processors unaffected. Route audit queries to a dedicated shadow query model so live billing traffic is never blocked during replay.

๐Ÿ“š Hard-Won Lessons from Production Event-Sourced Systems

  • Design events for readers, not writers. Rich, self-describing payloads survive upcasting; terse internal codes do not.
  • Snapshots are not optional at scale. An aggregate with 1,000 events pays a 1,000-event replay cost on every command without one. Define your threshold before going live.
  • Idempotent projections are mandatory. Every @EventHandler must be safe to call twice โ€” replay occurs during incident recovery and schema migration.
  • Schema evolution is the hardest operational problem. Deploy upcasters before new event versions, never after.
  • Replay is a first-class feature. Use it for analytics backfills, fraud investigation, and projection refactors.

๐Ÿ“Œ TLDR: Summary & Key Takeaways

  • Event sourcing stores immutable domain facts rather than mutable state rows; current state is always derivable by replaying the event log in order.
  • Aggregates are deterministic state machines: @CommandHandler enforces invariants; @EventSourcingHandler mutates state โ€” nowhere else. This separation makes replay reliable.
  • Snapshots are essential for long-lived aggregates. Without them, command latency grows linearly with stream length.
  • Projections are disposable read models. Because the event store is the source of truth, any query model can be rebuilt from the log at any time โ€” including for historical audit.
  • Schema evolution requires upcasters. Deploy the upcaster before the new event version; test replay in staging before promoting to production.
  • Audit trails, temporal queries, and replay-based dispute resolution are built-in features โ€” not bolt-ons.

๐Ÿ“ Practice Quiz

  1. What is the purpose of @EventSourcingHandler in an Axon aggregate, and why must all state mutation be restricted to these methods?

    A) It publishes events to Kafka for downstream consumers
    B) It mutates aggregate state in response to applied events, making state reconstruction via replay deterministic
    C) It validates incoming commands against the current aggregate state

    Correct Answer: B

  2. A subscription aggregate has 800 events and command processing latency is climbing. What is the most direct fix?

    A) Increase the database connection pool size
    B) Split the aggregate into two separate bounded contexts
    C) Add a snapshot trigger so the aggregate loads from a recent checkpoint instead of replaying 800 events

    Correct Answer: C

  3. Your team deploys a new version of SubscriptionSuspendedEvent that adds a reason field. Old events in the store lack this field. What breaks, and how do you fix it?

    A) Nothing breaks; Axon silently ignores missing fields by default
    B) Replay fails with a deserialization error; add a SingleEventUpcaster deployed before the new event version that populates reason with a safe default for old events
    C) The event store automatically migrates old events to the new schema on next replay

    Correct Answer: B

  4. Open-ended challenge: Your billing-history projection is rebuilt nightly for compliance reporting, but the rebuild takes 4 hours and blocks audit query responses during that window. How would you redesign the projection strategy โ€” including snapshot policy, parallel processing configuration, and query routing โ€” to bring rebuild time below 30 minutes without affecting live billing traffic?

Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms