All Posts

CQRS Pattern: Separating Write Models from Query Models at Scale

Design independent command and query paths to scale reads without weakening write correctness.

Abstract AlgorithmsAbstract Algorithms
··13 min read
Share
Share on X / Twitter
Share on LinkedIn
Copy link

TLDR: CQRS works when read and write workloads diverge, but only with explicit freshness budgets and projection reliability. The hard part is not separating models — it is operating lag, replay, and rollback safely.

An e-commerce platform's order service was running 47-table JOINs to serve its summary page — because reads and writes shared the same normalized model. Dashboards, search, and payment-status polls all hit the same write store. Adding read indexes slowed writes; write locks stalled dashboards. Response times hit 4 seconds. CQRS separates the write model (normalized, enforces invariants) from the read model (denormalized, shaped per consumer) so each path can be optimized independently.

If you design services where reads outnumber writes or different consumers need different data shapes, CQRS is the pattern that lets you scale and tune each side without undermining correctness on the other.

Worked example — one committed write event feeds two independent read models:

Order placed → PostgreSQL write store (normalized, source of truth)
           ↓ outbox event
    ┌────────────────────────────────────┐
    ▼                                    ▼
Customer timeline view        Finance export view
(Redis, keyed by customer_id) (Elasticsearch, keyed by SKU + date)

Neither read store is written in the request path. Projection workers listen to the event stream and update each view independently with their own checkpoints.

📖 Why CQRS Exists: Protect Write Truth While Shaping Fast Reads

Teams usually reach for CQRS after the same failure pattern repeats: one transactional model is asked to serve invariants, dashboards, search, timelines, and external API reads all at once. The result is often lock contention on the write path, expensive joins on the read path, and emergency cache layers that hide freshness problems instead of fixing them.

In architecture reviews, CQRS should answer four operational questions before anyone sketches a read model:

  • Which business rules must stay synchronous on the write path?
  • How stale can each read surface be before users notice?
  • How will projections replay without corrupting a newer view?
  • Which signal tells on-call that reads are behind before support tickets arrive?
Pressure on the systemCQRS responseWhat operators still need
Heavy read fan-out hurts write latencySeparate query store optimized for access patternsFreshness budget per read surface
Search and timelines need denormalized viewsProjection workers build purpose-fit modelsReplay checkpoints and backfill runbook
Write invariants must stay strictCommand side remains the only source of truthClear rule that queries never invent state
Different teams own different read workloadsIndependent projections per domain consumerOwnership for lag, schema, and recovery

🔍 The Boundary Model: Command Side, Event Stream, Projection Side

At a practical level, CQRS is not just two databases. It is a contract about where truth is written and how derivative views are built.

Building blockResponsibilityFailure to avoid
Command API and validatorsEnforce invariants and reject invalid state transitionsAllowing read-side shortcuts to mutate source-of-truth data
Transactional write storeCommit the durable business truthHiding partial writes behind async cleanup
Outbox or change streamPublish committed change events exactly once from the write boundaryDual-writing query stores in the request path
Projection workersConvert events into read-optimized views with checkpointsLosing ordering, checkpoints, or idempotency

⚙️ How the Write Path and Read Path Stay Separate

A healthy CQRS flow looks boring on purpose:

  1. The command handler validates the requested state change against current write-side rules.
  2. The transaction commits the write and records an outbox event or change record in the same durability boundary.
  3. A relay publishes that committed event to projection workers.
  4. Each projection updates its own read model with a checkpoint or last-event watermark.
  5. Queries read from the specialized store and expose freshness if they are allowed to be slightly behind.
Control pointWhat it protectsCommon mistake
Write authorityBusiness invariants stay in one placeLetting query code bypass validation
Outbox or change streamWrite commit and event emission stay atomicPublishing to the bus before the transaction commits
Projection checkpointReplay stays monotonic and resumableReprocessing old events without ordering guardrails

🧠 Deep Dive: Lag, Replay, and Projection Safety

The Internals: Write Authority, Checkpoints, and Read Staleness

The write side is the only place where business invariants should be enforced. Projection workers are downstream materializers; they should never be asked to resolve conflicts that belong to the command model.

That matters during failure. A replayed projection should rebuild a view from committed events, not guess what the latest truth is. Operators usually need three durable markers:

  • a source-of-truth commit version or event ID,
  • a per-projection checkpoint,
  • a freshness budget that maps lag into user impact.

A common failure pattern is to let the application dual-write the transactional store and the read store in the same request. It feels simpler until retries, partial commits, or timeouts produce two truths. CQRS only pays off when the query model is clearly derivative.

Performance Analysis: Metrics That Expose CQRS Trouble Early

MetricWhy it matters
Command commit p95Shows whether read concerns are leaking back into the write path
Projection lag by consumerIdentifies which read surface is drifting, not just that something is behind
Stale-read budget burnConverts lag into business impact for on-call prioritization
Replay throughputPredicts recovery time after projection outage or bad deploy

Average lag is not enough. One projection serving customer timelines can be healthy while another powering finance exports is hours behind. CQRS observability has to stay projection-specific, otherwise the dashboard looks green while one business surface is effectively down.

🚨 Operator Field Note: Freshness Budgets Fail Before Correctness Does

In incident reviews, the first visible symptom of CQRS trouble is usually outdated reads, not corrupted writes. Support tickets say a status is old long before anyone proves the write path is wrong.

Runbook clueWhat it usually meansFirst operator move
Command succeeded but the read screen is staleProjection worker is behind or stuck on one poison eventCompare latest committed event ID with the projection checkpoint, then quarantine the failing event
Replay backlog grows after deploymentNew projection code is slower or incompatible with old eventsFreeze expansion and benchmark replay throughput before retrying the rollout
One read model is hours behind while others are healthyLag is consumer-specific, not broker-wideScale or repair the affected projection rather than treating the whole bus as degraded
Users only in one region see stale dataRead-store replication or consumer placement is unevenCheck regional checkpoint skew before invalidating global caches

Operators usually find that the most valuable architecture review artifact is a freshness table per read surface: acceptable lag, pager threshold, and replay procedure.

📊 CQRS Flow: Commit Once, Project Many

flowchart TD
  A[Command API] --> B[Validate business rule]
  B --> C[Transactional write store]
  C --> D[Outbox or change stream]
  D --> E[Projection worker]
  E --> F[Query store]
  F --> G[API or UI read path]
  E --> H[Projection checkpoint]
  C --> I[Committed version token]
  I --> G

🌍 Real-World Applications: Realistic Scenario: Order Service With Timeline and Search Views

Consider an order platform with three very different read workloads:

  • customer-facing order timelines,
  • support-agent search,
  • finance reconciliation exports.

The write path needs strict invariants around payment capture and fulfillment state. The read side needs different storage and indexing strategies.

ConstraintDesign decisionTrade-off
Payment and fulfillment state must be correctPostgreSQL remains the write authorityWrite model stays normalized and not optimized for search
Support needs flexible search by email, SKU, and carrierElasticsearch projection for support queriesSearch view may lag behind committed state
Customer app needs fast timeline lookupsRedis or document-style read model keyed by order IDAnother projection to operate and replay
Finance needs auditable exportsBatch projection with checkpointed replayHigher recovery cost if event lineage is weak

⚖️ Trade-offs & Failure Modes: Trade-offs and Failure Modes

Failure modeSymptomRoot causeFirst mitigation
Dual-write temptationWrites succeed but one read store disagrees silentlyQuery store updated directly from request codeMove projection updates behind an outbox or change stream
No freshness budgetTeams argue whether stale data is an incidentLag has no product-defined thresholdDefine per-surface freshness SLOs
Replay poisoningProjection cannot recover after bad event or schema changeEvents are not versioned or handlers are not idempotentAdd versioned event handlers and quarantined replays
Read-store sprawlEvery team adds its own view with no ownershipCQRS used as permission to duplicate data endlesslyRequire owner, SLO, and replay plan per projection

CQRS is worth the cost when read workloads are truly different. If the only goal is maybe faster later, teams usually end up with more systems and the same old ambiguity.

🧭 Decision Guide: When CQRS Earns Its Complexity

SituationRecommendation
Reads are simple CRUD and freshness must be immediateStay with one transactional model
Writes require strict invariants but reads diverge heavilyAdopt CQRS for the bounded domain causing pain
Search, analytics, and timelines need different shapesAdd projections with explicit lag budgets
Team cannot yet operate replay and projection recoveryDelay CQRS until operational tooling exists

Start with one domain that already hurts, such as orders or billing, and prove that lag, replay, and recovery are manageable before expanding the pattern elsewhere.

🧪 Practical Example: Order Service With Axon Framework

The order scenario from the previous section maps directly to Axon's programming model. Commands flow in through CommandGateway, the aggregate persists events to the event store, and the projection worker builds the timeline read model asynchronously.

flowchart LR
  A["REST POST /orders"] --> B[CommandGateway]
  B --> C[OrderAggregate]
  C --> D[EventStore]
  D --> E[EventBus]
  E --> F["@EventHandler\nProjection Worker"]
  F --> G[QueryStore]
  G --> H[QueryGateway]
  H --> I["REST GET /timeline"]

Maven dependency

<dependency>
  <groupId>org.axonframework</groupId>
  <artifactId>axon-spring-boot-starter</artifactId>
  <version>4.9.3</version>
</dependency>

Commands and queries as plain records

Commands carry intent; queries carry selection criteria. Neither holds logic.

public record PlaceOrderCommand(String orderId, String customerId, long totalCents) {}
public record GetOrderTimelineQuery(String customerId) {}

Write side: aggregate enforces invariants, never touches the read store

AggregateLifecycle.apply() commits the event to the Axon event store. The projection worker receives it asynchronously — the aggregate never writes to a query table directly.

@Aggregate
public class OrderAggregate {
    @AggregateIdentifier
    private String orderId;

    @CommandHandler
    public OrderAggregate(PlaceOrderCommand cmd) {
        AggregateLifecycle.apply(
            new OrderPlacedEvent(cmd.orderId(), cmd.customerId(), cmd.totalCents())
        );
    }

    @EventSourcingHandler
    public void on(OrderPlacedEvent event) {
        this.orderId = event.orderId();
    }
}

Read side: projection worker materializes the timeline view

The @EventHandler builds the read model from every committed OrderPlacedEvent. The @QueryHandler answers timeline requests dispatched by QueryGateway — the REST layer never reaches the write store.

@Component
public class OrderTimelineProjection {
    private final OrderTimelineRepository repo;

    @EventHandler
    public void on(OrderPlacedEvent event, @Timestamp Instant timestamp) {
        repo.save(new OrderTimelineEntry(
            event.orderId(), event.customerId(), "PLACED", timestamp
        ));
    }

    @QueryHandler
    public List<OrderTimelineEntry> handle(GetOrderTimelineQuery query) {
        return repo.findByCustomerIdOrderByTimestampDesc(query.customerId());
    }
}

Query REST controller

@RestController
@RequestMapping("/orders")
public class OrderQueryController {
    private final QueryGateway queryGateway;

    @GetMapping("/timeline/{customerId}")
    public CompletableFuture<List<OrderTimelineEntry>> getTimeline(
            @PathVariable String customerId) {
        return queryGateway.query(
            new GetOrderTimelineQuery(customerId),
            ResponseTypes.multipleInstancesOf(OrderTimelineEntry.class)
        );
    }
}

🛠️ Axon Framework and EventStoreDB: CQRS on the JVM

Axon Framework is a Java framework built specifically for CQRS, event sourcing, and DDD aggregates on Spring Boot. It provides CommandGateway, QueryGateway, @CommandHandler, @QueryHandler, and @EventHandler — the complete wiring for a CQRS command-and-query split. EventStoreDB is a purpose-built append-only event database with server-side projections, used as the event log backend for event-sourced Axon aggregates.

Axon Framework solves the CQRS problem by enforcing the command/query boundary in code rather than convention: commands flow through CommandGateway to aggregates that enforce invariants; queries flow through QueryGateway to projection handlers that serve read models. The framework owns aggregate lifecycle, event serialization, checkpointing, and replay — teams write domain logic, not plumbing.

The code examples in the 🧪 Practical Example section above show the full flow. Below is the minimal wiring that makes the separation enforceable:

// ---- Command side: send a command and get a confirmation ----
@RestController
@RequestMapping("/orders")
public class OrderCommandController {
    private final CommandGateway commandGateway;

    @PostMapping
    public CompletableFuture<String> placeOrder(@RequestBody PlaceOrderRequest req) {
        // CommandGateway routes to OrderAggregate's @CommandHandler
        // Returns the aggregate identifier on success
        return commandGateway.send(
            new PlaceOrderCommand(UUID.randomUUID().toString(),
                                  req.customerId(), req.totalCents())
        );
    }
}

// ---- Query side: read the projection built from committed events ----
@RestController
@RequestMapping("/orders")
public class OrderQueryController {
    private final QueryGateway queryGateway;

    @GetMapping("/timeline/{customerId}")
    public CompletableFuture<List<OrderTimelineEntry>> getTimeline(
            @PathVariable String customerId) {
        // QueryGateway routes to OrderTimelineProjection's @QueryHandler
        // Never touches the write store
        return queryGateway.query(
            new GetOrderTimelineQuery(customerId),
            ResponseTypes.multipleInstancesOf(OrderTimelineEntry.class)
        );
    }
}

The CommandGateway and QueryGateway beans are auto-configured by axon-spring-boot-starter — no manual wiring needed. Adding EventStoreDB as the backend replaces the default JPA event store with a true append-only log that supports server-side subscriptions and projection checkpointing.

For a full deep-dive on Axon Framework and EventStoreDB, a dedicated follow-up post is planned.

📚 Lessons Learned

  • CQRS is a control boundary between write truth and read convenience, not a blanket microservices rule.
  • Freshness budgets and replay tooling matter as much as the schema design.
  • Projection-specific lag is a better signal than one global event-bus metric.
  • Dual-writing the read model from the request path removes most of CQRS's safety benefits.
  • A projection without an owner, checkpoint, and recovery plan is operational debt.

📌 TLDR: Summary & Key Takeaways

  • Keep the write model authoritative and let read models stay derivative.
  • Define freshness, replay, and rollback rules before scaling projections.
  • Observe lag per read surface, not only at the broker or topic level.
  • Use CQRS where read shapes materially diverge from write correctness needs.
  • Treat projection recovery as a first-class runbook, not an afterthought.

📝 Practice Quiz

  1. Which CQRS signal usually turns into the first customer-visible incident?

A) Command commit count only
B) Projection lag against a read surface's freshness budget
C) Number of projection tables in the database

Correct Answer: B

  1. What is the safest place to enforce business invariants in CQRS?

A) Inside every projection worker
B) In the command side and transactional write store
C) In the UI cache layer

Correct Answer: B

  1. Why is an outbox or change stream preferred over dual-writing the read model from the request path?

A) It keeps write commit and event publication in the same durability boundary
B) It makes eventual consistency disappear
C) It removes the need for projection checkpoints

Correct Answer: A

  1. Open-ended challenge: if finance exports can tolerate 30 minutes of lag but customer timelines can tolerate only 30 seconds, how would you redesign projection priorities, alert thresholds, and replay capacity?
Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms