Event Sourcing Pattern: Auditability, Replay, and Evolution of Domain State
Persist domain facts as immutable events and rebuild state predictably under change.
Abstract AlgorithmsTLDR: Event sourcing pays off when regulatory audit history and replay are first-class requirements โ but it demands strict schema evolution, a snapshot strategy, and a framework that owns aggregate lifecycle. Spring Boot + Axon Framework is the fastest production-grade path on the JVM.
๐ Why Storing Events Instead of State Changes Everything
In 2017, a GitLab database administrator ran rm -rf on the wrong production server. They had no event log โ just nightly snapshots. Six hours of user data was lost permanently, and thousands of repositories were irrecoverable. Event sourcing would have made full replay possible from any point in that six-hour window. That one architectural choice โ append events instead of overwriting state โ is the difference between "we can restore to any second" and "we lost six hours and cannot get them back."
Most databases store the current state of a record.A subscription row has a status column. When billing suspends the account, you overwrite ACTIVE with SUSPENDED. Done โ but the why, when, and sequence of transitions that led there are gone.
Event sourcing flips the model. Instead of storing the latest snapshot of truth, you store every domain event that caused a state change as an append-only log. Current state is derived on demand by replaying those events in sequence. The log is the audit trail โ not a derived artefact built on top of it.
| Aspect | Traditional CRUD | Event Sourcing |
| What is stored | Current row state | Ordered sequence of immutable events |
| Audit history | Requires separate audit table | Built-in โ the event log is the record |
| Temporal queries | Difficult without CDC or snapshots | Replay the stream to any past position |
| Concurrent writes | Last-write-wins risk without care | Optimistic concurrency on stream version |
| Schema evolution | ALTER TABLE migrations | Event upcasting at read time |
You gain a tamper-evident fact log, time-travel queries, and decoupled read models. You give up simple SELECT * queries and accept the operational cost of snapshot management and schema versioning.
๐ The Four Building Blocks of an Event-Sourced System
Every production event-sourced system has four roles:
- Command โ an intent to change state; validated against current aggregate state before writing.
- Aggregate โ the consistency boundary; enforces invariants, emits events, and advances its internal state machine.
- Event Store โ the append-only log; events are immutable, each aggregate instance owns a stream by ID.
- Projection โ a read model rebuilt from the event stream; projections are disposable and always rebuildable.
โ๏ธ How a Command Flows into an Auditable Event Stream
flowchart TD
C[Client Command] --> CH["Command Handler\n(SubscriptionAggregate)"]
CH -->|"validates invariants\napplies event"| ES[("Event Store\nAppend-Only Log")]
ES -->|"event published\non event bus"| P["BillingHistoryProjection\n(Event Handler)"]
P --> QM[("Query Model\nBillingHistoryRepository")]
QM -->|"query response"| Q[GetBillingHistoryQuery]
ES -. "token-based replay" .-> RP["Replay Processor\n(TrackingEventProcessor)"]
RP -. "rebuilds view\nfor audit dispute" .-> QM
style ES fill:#f5f5f5,stroke:#555
style QM fill:#e8f4e8,stroke:#555
style RP fill:#fff3e0,stroke:#f90,stroke-dasharray: 5 5
Solid arrows show the live command path. Dashed arrows show replay โ the TrackingEventProcessor resets its token to reconstruct the query model for audit at any historical timestamp.
The aggregate never writes directly to the query model. It emits events; projections consume them independently. A new projection โ say, a fraud-detection read model โ can be added without touching existing aggregate code.
๐ Event-Sourcing Data Flow Overview
flowchart TD
CMD[Command] --> AGG["Aggregate\nvalidates invariants"]
AGG -->|"emit event"| ES[("Event Store\nappend-only")]
ES -->|"project"| RM[Read Model]
ES -. "replay" .-> AUDIT[Audit View]
๐ง Deep Dive: Inside the Aggregate: State Machines, Snapshots, and Schema Evolution
Internals: Aggregate State Reconstruction
An aggregate's state exists only in memory during command processing. Before handling a command, the framework loads the aggregate by replaying every past event for that aggregate ID in sequence. Each @EventSourcingHandler method advances internal state โ status flags, counters, IDs โ until the aggregate is fully current. The command handler then checks invariants against that reconstructed in-memory state.
This is powerful but carries a cost: if a subscription has 5,000 events, loading it means replaying 5,000 events before each command. Snapshots solve this. A snapshot captures the full aggregate state at event N; the next load starts from the snapshot and replays only the delta after N.
Schema Evolution Through Upcasting
Events are immutable, but their schemas change. Old stored events must be upcasted โ transformed at read time into the new schema without modifying stored data. Axon's EventUpcasterChain handles this transparently. The rule: always deploy upcasters before deploying new event versions.
Performance Analysis: Replay Cost Drivers
| Factor | Impact | Mitigation |
| Event stream length | Linear aggregate load time | Snapshot every N events |
| Projection rebuild | Full event store scan | Token-based reset with parallel threads |
| Upcaster chain depth | CPU overhead at deserialization | Keep upcasters thin; version events early |
| Projection lag | Stale reads during backfill | Monitor processor lag; dedicate a shadow DB for replay |
๐ ๏ธ Axon Framework and EventStoreDB: Event Sourcing on the JVM
Axon Framework is a Spring Boot-native Java framework that manages the full event-sourcing lifecycle: aggregate command handling, event persistence, snapshotting, replay, upcasting, and projection tracking. EventStoreDB is a purpose-built append-only database with server-side projections and persistent subscription support โ the recommended backend for production Axon deployments requiring audit-grade storage.
These tools solve the event-sourcing problem by owning the infrastructure that makes aggregates deterministic: Axon's @CommandHandler / @EventSourcingHandler pattern enforces the strict separation between command validation and state mutation; the TrackingEventProcessor manages checkpoints and replay; the EventUpcasterChain handles schema evolution transparently. Teams write domain logic; Axon owns the replay machinery.
The complete SubscriptionAggregate, BillingHistoryProjection, snapshot configuration, and replay code are shown in the ๐งช Subscription Billing section below. The minimal starting dependency:
<dependency>
<groupId>org.axonframework</groupId>
<artifactId>axon-spring-boot-starter</artifactId>
<version>4.9.3</version>
</dependency>
<!-- Optional: EventStoreDB connector replaces the default JPA event store -->
<dependency>
<groupId>org.axonframework.extensions.eventstored</groupId>
<artifactId>axon-eventstoredb-spring-boot-starter</artifactId>
<version>0.1.0</version>
</dependency>
| Framework | Strengths | Best fit |
| Axon Framework (Spring Boot) | Spring-native, full ES + CQRS lifecycle, built-in snapshots, replay, and upcasting | Enterprise Spring Boot teams wanting all pieces integrated |
| EventStoreDB Java client | Purpose-built append-only store, server-side projections, excellent audit semantics | Teams that want a best-in-class store and will wire their own projections |
| Spring Data + custom event table | Lightweight, no new infrastructure; PostgreSQL append-only event table with Outbox | Simple domains; teams wary of framework lock-in |
| Lagom (Akka-based) | Reactive, high throughput, persistent entities, cluster sharding | High-concurrency JVM services already on the Akka stack |
For a full deep-dive on Axon Framework and EventStoreDB in production, a dedicated follow-up post is planned.
๐ Real-World Applications
Event sourcing earns its complexity where audit trails and replay are first-class business requirements.
| Company / Industry | Driver | Event sourcing advantage |
| LMAX Exchange โ finance | 6M+ orders/sec with full regulatory audit | Replay market state to any timestamp for regulators |
| Shopify โ e-commerce | Fraud investigation, inventory disputes | Replay order event stream to exact inventory at purchase time |
| Healthcare systems | Consent tracking, patient record disputes | Immutable facts with time-travel replay; no separate audit table |
| Insurance | Claims and policy versioning | Full decision trail; compensation events on reversals |
๐งช Subscription Billing: Building the Aggregate, Projection, and Replay
Scenario: A billing platform tracks the lifecycle of each subscription โ CREATED โ ACTIVATED โ SUSPENDED โ CANCELLED. Every state transition is an immutable domain event appended to the subscription's event stream. When a customer disputes a charge, the support team replays the event stream to reconstruct exactly what the account looked like at the moment of the disputed transaction.
Maven Dependency
<dependency>
<groupId>org.axonframework</groupId>
<artifactId>axon-spring-boot-starter</artifactId>
<version>4.9.3</version>
</dependency>
Domain Events (Immutable Value Objects)
public record SubscriptionCreatedEvent(
String subscriptionId, String tenantId, String planId, Instant occurredAt) {}
public record SubscriptionActivatedEvent(
String subscriptionId, String tenantId, Instant occurredAt) {}
public record SubscriptionSuspendedEvent(
String subscriptionId, String tenantId, String reason, Instant occurredAt) {}
public record SubscriptionCancelledEvent(
String subscriptionId, String tenantId, String reason, Instant occurredAt) {}
Each event is a value object with no setters. The aggregate assigns IDs and timestamps at the command-handler boundary โ events never generate their own identity.
SubscriptionAggregate
@Aggregate(snapshotTriggerDefinition = "subscriptionSnapshotTrigger")
public class SubscriptionAggregate {
@AggregateIdentifier
private String subscriptionId;
private SubscriptionStatus status;
private String tenantId;
protected SubscriptionAggregate() {} // required by Axon for event-sourced replay
@CommandHandler
public SubscriptionAggregate(CreateSubscriptionCommand cmd) {
AggregateLifecycle.apply(new SubscriptionCreatedEvent(
cmd.subscriptionId(), cmd.tenantId(), cmd.planId(), Instant.now()));
}
@EventSourcingHandler
public void on(SubscriptionCreatedEvent event) {
this.subscriptionId = event.subscriptionId();
this.tenantId = event.tenantId();
this.status = SubscriptionStatus.CREATED;
}
@CommandHandler
public void handle(SuspendSubscriptionCommand cmd) {
if (status != SubscriptionStatus.ACTIVE) {
throw new IllegalStateException(
"Only ACTIVE subscriptions can be suspended; current status: " + status);
}
AggregateLifecycle.apply(new SubscriptionSuspendedEvent(
subscriptionId, tenantId, cmd.reason(), Instant.now()));
}
@EventSourcingHandler
public void on(SubscriptionSuspendedEvent event) {
this.status = SubscriptionStatus.SUSPENDED;
}
}
@CommandHandler enforces invariants then calls AggregateLifecycle.apply(). @EventSourcingHandler is the only place state is mutated โ this strict separation is why replay is always deterministic regardless of how many times it runs.
Snapshot Configuration โ Preventing Cold-Start Replay Tax
@Configuration
public class AxonConfig {
@Bean
public SnapshotTriggerDefinition subscriptionSnapshotTrigger(Snapshotter snapshotter) {
// Capture a snapshot after every 50 events.
// Next load starts from the snapshot and replays only the delta (โค 49 events).
return new EventCountSnapshotTriggerDefinition(snapshotter, 50);
}
}
Without snapshots, a subscription with 500 billing events pays a 500-event replay cost on every command. With a threshold of 50, the worst-case delta is 49 events.
BillingHistoryProjection โ Read Model and Audit Query Handler
@Component
@ProcessingGroup("billing-history")
public class BillingHistoryProjection {
private final BillingHistoryRepository repo;
public BillingHistoryProjection(BillingHistoryRepository repo) {
this.repo = repo;
}
@EventHandler
public void on(SubscriptionCreatedEvent event, @Timestamp Instant eventTimestamp) {
repo.save(new BillingHistoryEntry(
event.subscriptionId(), event.tenantId(),
"CREATED", event.planId(), eventTimestamp));
}
@EventHandler
public void on(SubscriptionSuspendedEvent event, @Timestamp Instant eventTimestamp) {
repo.updateStatus(
event.subscriptionId(), "SUSPENDED", event.reason(), eventTimestamp);
}
@QueryHandler
public List<BillingHistoryEntry> handle(GetBillingHistoryQuery query) {
return repo.findBySubscriptionId(query.subscriptionId());
}
}
Every @EventHandler must be idempotent โ replay will call these methods again during incident recovery and projection refactors. Use upsert semantics keyed on the event's sequence number to guarantee safety.
Replaying the Event Stream for Audit Disputes
When a customer disputes a charge and the team needs the account state at a specific past timestamp, reset the projection's tracking token to replay from the event store:
// Reset and replay the billing-history projection from the beginning of the event store
eventProcessingConfig
.eventProcessorByProcessingGroup("billing-history", TrackingEventProcessor.class)
.ifPresent(processor -> {
processor.shutDown();
processor.resetTokens(); // replays all events in stream order
processor.start();
});
To scope the replay to a specific timestamp window, filter inside the @EventHandler by comparing the injected @Timestamp Instant against the dispute window before persisting. The event store is immutable โ replay always produces the same result, making it a reliable audit mechanism.
โ๏ธ Trade-offs & Failure Modes in Practice
| Failure mode | Symptom | Root cause | First mitigation |
| Long aggregate streams | High command latency on warm-up | No snapshot strategy | Add EventCountSnapshotTriggerDefinition |
| Incompatible old events | ClassCastException during replay after deploy | Schema changed without upcaster | Add SingleEventUpcaster before deploying new event version |
| Projection lag under load | Stale reads; audit disputes on in-flight data | Insufficient processor threads | Increase TrackingEventProcessor thread count |
| Unbounded event store growth | Storage cost; slow tail scans | No retention or archival policy | Archive cold streams; keep hot window in fast storage tier |
| Non-idempotent projection | Duplicate rows after replay | @EventHandler not safe to call twice | Use upsert keyed on aggregate ID + event sequence number |
๐งญ Decision Guide: When Event Sourcing Earns Its Complexity
| Situation | Recommendation |
| Regulatory audit trail required (finance, healthcare, insurance) | Strong fit โ the event log is the compliance record |
| Temporal queries: "what was the state at time T?" | Strong fit โ replay to any past stream position |
| Simple CRUD with no audit or replay requirements | Avoid โ operational overhead is not justified |
| High write throughput (>10k events/sec per stream) | Use with caution โ partition streams; evaluate Axon Server |
| Team unfamiliar with CQRS and aggregate design | Run EventStorming workshops and model the domain first |
๐ง Operator Field Note: Three Production Realities
1. Snapshot monitoring. Track axon_command_bus_handler_latency_seconds per aggregate type. Climbing latency with aggregate age signals snapshots are not firing. Query DomainEventEntry sorted by event count to find outliers.
2. Schema-incompatible old events. A ClassCastException during replay almost always means a missing upcaster. Safe sequence: write a SingleEventUpcaster (V1 โ V2), deploy it before the new event version, then deploy the aggregate code. Never modify stored events in place.
3. Isolated projection replay. Each @ProcessingGroup owns its own tracking token. Resetting billing-history leaves all other processors unaffected. Route audit queries to a dedicated shadow query model so live billing traffic is never blocked during replay.
๐ Hard-Won Lessons from Production Event-Sourced Systems
- Design events for readers, not writers. Rich, self-describing payloads survive upcasting; terse internal codes do not.
- Snapshots are not optional at scale. An aggregate with 1,000 events pays a 1,000-event replay cost on every command without one. Define your threshold before going live.
- Idempotent projections are mandatory. Every
@EventHandlermust be safe to call twice โ replay occurs during incident recovery and schema migration. - Schema evolution is the hardest operational problem. Deploy upcasters before new event versions, never after.
- Replay is a first-class feature. Use it for analytics backfills, fraud investigation, and projection refactors.
๐ TLDR: Summary & Key Takeaways
- Event sourcing stores immutable domain facts rather than mutable state rows; current state is always derivable by replaying the event log in order.
- Aggregates are deterministic state machines:
@CommandHandlerenforces invariants;@EventSourcingHandlermutates state โ nowhere else. This separation makes replay reliable. - Snapshots are essential for long-lived aggregates. Without them, command latency grows linearly with stream length.
- Projections are disposable read models. Because the event store is the source of truth, any query model can be rebuilt from the log at any time โ including for historical audit.
- Schema evolution requires upcasters. Deploy the upcaster before the new event version; test replay in staging before promoting to production.
- Audit trails, temporal queries, and replay-based dispute resolution are built-in features โ not bolt-ons.
๐ Practice Quiz
What is the purpose of
@EventSourcingHandlerin an Axon aggregate, and why must all state mutation be restricted to these methods?A) It publishes events to Kafka for downstream consumers
B) It mutates aggregate state in response to applied events, making state reconstruction via replay deterministic
C) It validates incoming commands against the current aggregate stateCorrect Answer: B
A subscription aggregate has 800 events and command processing latency is climbing. What is the most direct fix?
A) Increase the database connection pool size
B) Split the aggregate into two separate bounded contexts
C) Add a snapshot trigger so the aggregate loads from a recent checkpoint instead of replaying 800 eventsCorrect Answer: C
Your team deploys a new version of
SubscriptionSuspendedEventthat adds areasonfield. Old events in the store lack this field. What breaks, and how do you fix it?A) Nothing breaks; Axon silently ignores missing fields by default
B) Replay fails with a deserialization error; add aSingleEventUpcasterdeployed before the new event version that populatesreasonwith a safe default for old events
C) The event store automatically migrates old events to the new schema on next replayCorrect Answer: B
Open-ended challenge: Your
billing-historyprojection is rebuilt nightly for compliance reporting, but the rebuild takes 4 hours and blocks audit query responses during that window. How would you redesign the projection strategy โ including snapshot policy, parallel processing configuration, and query routing โ to bring rebuild time below 30 minutes without affecting live billing traffic?
๐ Related Posts

Written by
Abstract Algorithms
@abstractalgorithms
More Posts
Types of LLM Quantization: By Timing, Scope, and Mapping
TLDR: There is no single "best" LLM quantization. You classify and choose quantization along three axes: when you quantize (timing), what you quantize (scope), and how values are encoded (mapping). In practice, most teams start with weight quantizati...
Stream Processing Pipeline Pattern: Stateful Real-Time Data Products
TLDR: Stream pipelines succeed when event-time semantics, state management, and replay strategy are designed together โ and Kafka Streams lets you build all three directly inside your Spring Boot service. Stripe's real-time fraud detection processes...
Service Mesh Pattern: Control Plane, Data Plane, and Zero-Trust Traffic
TLDR: A service mesh intercepts all service-to-service traffic via injected Envoy sidecar proxies, letting a platform team enforce mTLS, retries, timeouts, and circuit breaking centrally โ without changing application code. Reach for it when cross-te...
Serverless Architecture Pattern: Event-Driven Scale with Operational Guardrails
TLDR: Serverless is strongest for spiky asynchronous workloads when cold-start, observability, and state boundaries are intentionally designed. TLDR: Serverless works best for spiky, event-driven workloads when you design for idempotency, observabili...
