All Posts

System Design HLD Example: Collaborative Document Editing (Google Docs)

A senior-level HLD for real-time collaborative editing with Operational Transformation (OT) and conflict resolution.

Abstract AlgorithmsAbstract Algorithms
··12 min read
Share
Share on X / Twitter
Share on LinkedIn
Copy link

TLDR: Real-time collaborative editing relies on Operational Transformation (OT) or CRDTs to resolve concurrent edits without data loss. The core trade-off is Latency vs. Consistency: we use optimistic local updates for zero-latency typing and a centralized OT server to ensure all clients eventually converge to the exact same document state.

✍️ The "Hello World" Race Condition

Imagine two users, Alice and Bob, are editing the same sentence: "The cat is happy." Alice wants to change it to "The fat cat is happy," so she inserts "fat " at position 4. Simultaneously, Bob wants to change it to "The cat is very happy," so he inserts "very " at position 11.

In a naive system:

  1. Alice's Edit: Server applies INSERT("fat ", pos:4). Doc becomes: "The fat cat is happy."
  2. Bob's Edit: Server applies Bob's original INSERT("very ", pos:11).
  3. The Result: "The fat cat is very hvery appy."

Wait, what happened? Because Alice's insert shifted the entire string by 4 characters, Bob's original position 11 is no longer the space before "happy"—it's now in the middle of the word "happy." This is the Position Shift Problem. In a collaborative environment, every edit changes the coordinate system for every other concurrent edit. Without a specialized resolution algorithm, the document quickly becomes a corrupted mess of overlapping characters.

📖 Real-Time Collaboration: Use Cases & Requirements

Actors

  • Author / Editor: Creates and modifies document content in real-time.
  • Collaborator: Views live changes and contributes simultaneously.
  • Viewer: Consumes the document without editing rights but still expects live updates.

Functional Requirements

  • Real-Time Editing: Multiple users can edit the same document simultaneously.
  • Conflict Resolution: All users must eventually see the exact same document state (Convergence).
  • Presence: Show which users are currently active and their cursor positions.
  • Version History: Ability to see past revisions and restore previous states.
  • Offline Support: (Partial) Allow users to continue typing during brief network drops.

Non-Functional Requirements

  • Ultra-Low Latency: Keystrokes should appear on other screens in < 100ms.
  • High Availability: The service must be available 99.99% of the time.
  • Scalability: Support 10,000+ concurrent editors on a single "hot" document (e.g., a viral public doc).
  • Durability: No committed edit should ever be lost.

🔍 Basics: Operational Transformation (OT)

Operational Transformation is the "classic" approach used by Google Docs. The core idea is that the server doesn't just apply an operation; it transforms it against all other operations that happened since the client's last known version.

Every operation is represented as a triple: (Type, Position, Value). When the server receives an operation based on version V, but the current server version is V+k, it runs the new operation through a transform function T(op_new, op_concurrent) for all $k$ operations. This ensures that the user's intent is preserved even if the index has changed.

⚙️ Mechanics: The Transform Matrix

To implement OT, you must define how every operation type interacts with every other operation type.

  • Insert vs. Insert: If User A inserts at position 5 and User B inserts at position 10, no change is needed for A, but B's position must be incremented by the length of A's insert.
  • Insert vs. Delete: If a character is deleted before an insert, the insert position must be decremented.
  • Delete vs. Delete: If both users delete the same character, the second operation becomes a "no-op" to avoid double-deletion.

This "Matrix" of rules ensures that no matter what order operations arrive in, the final result is deterministic.

📐 Estimations & Design Goals

The Math of Real-Time Sync

  • Average Typist: 60 words per minute $\approx 5$ characters per second.
  • Active Users per Doc: 10 (average) to 1,000 (hot docs).
  • Traffic per Doc: 10 users 5 chars/sec = *50 operations/sec.
  • Fan-out: Every 1 operation must be broadcast to the other 9 users.
  • Total Broadcasts: $50 \times 9 = 450$ messages per second per document.

Design Goal: Use WebSockets for bidirectional, low-latency communication. Standard HTTP polling would be too slow and would overwhelm the server with headers.

📊 High-Level Design: The Centralized OT Architecture

The following architecture utilizes a centralized "Orderer" to ensure all operations are sequenced correctly.

graph TD
    UserA((User A)) <--> WSS1[WS Server Shard 1]
    UserB((User B)) <--> WSS1
    UserC((User C)) <--> WSS2[WS Server Shard 2]

    WSS1 <--> LB[Internal Load Balancer]
    WSS2 <--> LB

    LB <--> OTS[OT Service: The Arbiter]
    OTS --> PDB[(Document Store: Postgres)]
    OTS --> RC[(Active Doc Cache: Redis)]

    OTS --> PubSub[Redis Pub/Sub]
    PubSub --> WSS1
    PubSub --> WSS2

Explanation of the Architecture: The system relies on persistent WebSocket connections to minimize latency. Clients send operations to a WebSocket Server, which forwards them to a centralized OT Service. This service is the "Single Source of Truth"—it assigns a version number to every edit and performs the transformation against concurrent edits. Once an edit is committed to the PostgreSQL log, it is broadcast back out via Redis Pub/Sub to all WebSocket shards, which then push the update to the connected clients.

🔌 API Design: The Contract

Since this is a real-time system, the "API" is mostly a WebSocket message protocol.

Message TypeDirectionPayloadDescription
DOC_OPENClient -> Server{ doc_id, last_version }Initiates a session and requests catch-up edits.
OP_SUBMITClient -> Server{ type, pos, val, base_version }Submits a new local edit.
OP_BROADCASTServer -> Client{ type, pos, val, new_version, user_id }Broadcasts a confirmed edit to all peers.
PRESENCEClient -> Server{ cursor_pos, selection_range }Updates the user's visual state for others.

🗄️ Data Model: Schema Definitions

Operations Log (PostgreSQL)

We store every single edit as an immutable row. This is the Event Sourcing pattern.

TableColumnTypeNotes
operationsdoc_idUUIDComposite PK with version.
operationsversionINTMonotonically increasing per document.
operationstypeENUMinsert, delete, retain.
operationsposINTThe transformed position.
operationsvalueTEXTThe character(s) added.
operationsauthor_idUUIDFor attribution.

Document Snapshots (PostgreSQL)

Replaying 1 million operations to open a doc is slow. We take snapshots every 100 edits.

TableColumnTypeNotes
snapshotsdoc_idUUIDPK.
snapshotsversionINTThe version this snapshot represents.
snapshotscontentTEXTThe full serialized document state.

🧭 Tech Stack & Design Choices

ComponentChoiceRationale
CommunicationWebSockets (Socket.io)Low-latency bidirectional sync is mandatory for "live" feel.
OT EngineNode.jsExcellent for high-concurrency I/O and shared JS logic between client/server.
Main DatabasePostgreSQLStrong ACID for the operation log; JSONB support for complex doc structures.
Pub/SubRedisLightning-fast message distribution to WebSocket shards.
CacheRedisStores active document state and "Presence" (cursor) data with TTL.

🧠 Design Deep Dive

🛡️ Internals: Client-Side Optimistic Updates

To make the editor feel "instant," the client doesn't wait for the server.

  1. Apply Locally: User types 'A'. The client updates the UI immediately.
  2. Buffer: The client puts this 'A' into a pending_buffer.
  3. Wait for ACK: The client sends the op to the server. While waiting for the server's ACK, if another user's edit arrives, the client transforms its own pending_buffer against the incoming edit to keep the UI consistent.

📊 Performance Analysis: The "Viral Doc" Problem

What happens if 5,000 people open the same document?

  • The Bottleneck: The single OT Service instance for that doc ID. OT must be processed sequentially per document to ensure consistency.
  • Optimization: Document Sharding. We ensure all editors for doc_A are routed to Server_1, while doc_B goes to Server_2.
  • Fan-out Load: If 5,000 users type simultaneously, the Redis Pub/Sub will be flooded. We use Rate Limiting on the presence/cursor updates (e.g., send cursor moves only every 50ms) while keeping text edits high-priority.

🌍 Real-World Applications

Collaborative architectures power more than just text editors:

  • Design Tools: Figma (uses a CRDT-inspired variant).
  • Whiteboards: Miro or LucidChart.
  • Coding: VS Code Live Share.
  • Project Management: Trello or Monday.com live updates.

⚖️ Trade-offs & Failure Modes

  1. OT vs. CRDT:
    • OT (Used here): Requires a central server. Simpler for text, better for conflict resolution in complex formatting.
    • CRDT (Conflict-free Replicated Data Types): Can work peer-to-peer (no server). High memory overhead because it must store metadata for every character ever deleted.
  2. Snapshot Frequency: Snapshotting every 10 edits saves time but costs storage. Snapshotting every 1,000 edits saves storage but makes "doc open" slow. We chose 100 as a middle ground.
  3. Failure Mode: Network Partition. If a client loses internet, they continue to buffer edits. Upon reconnect, the server must perform a "Bulk Transform" of all buffered edits. If the gap is too large (e.g., 2 hours), we force a document refresh to avoid corruption.

🏗️ Advanced Concepts for Production

  1. Delta Compression: Instead of sending every character, we group characters typed within a short window (e.g., 100ms) into a single insert operation to reduce WebSocket overhead.
  2. Operational Metadata: Moving beyond text—how do you "OT" a bold tag? We represent formatting as "Attributes" on a range of text, and transforms now include range arithmetic.
  3. Presence via Redis TTL: Cursor positions are volatile. We store them in Redis with a 5-second TTL. If a user closes their laptop, their cursor disappears automatically as the key expires.

🧭 Decision Guide

MetricOT (Operational Transformation)CRDT (Conflict-free Types)
Central ServerRequired (Arbiter).Not required (P2P possible).
ComplexityHigh (The Matrix).Very High (Mathematical data structures).
OverheadLow (Small ops).High (Metadata for every char).
Use CaseGoogle Docs, Office 365.Figma, Local-first apps.

🧪 Practical Example: Interview Delivery

In a 45-minute interview, follow this beat:

  1. The Conflict (5 min): Explain why Bob's delete fails if Alice inserts first.
  2. The Sequence (10 min): Explain the Ops Log. Every edit is an event with a version number.
  3. The scaling (15 min): Explain how Redis Pub/Sub connects multiple WebSocket servers so Alice (on Server 1) can see Bob (on Server 2).

Standard Conclusion: "I chose a centralized OT approach because it provides the most deterministic convergence for rich-text editing. By sharding documents by ID and using Redis for real-time fan-out, we can support high-concurrency documents while maintaining an immutable audit log of every edit in PostgreSQL."

🛠️ ShareDB: Implementing OT in Practice

In a production Node.js environment, we often use ShareDB, a mature OT backend that handles the complex transform logic for us.

// Server-side: Initializing a document session
const ShareDB = require('sharedb');
const backend = new ShareDB();

// When a WebSocket connects...
const connection = backend.connect();
const doc = connection.get('documents', 'doc-123');

doc.fetch(function(err) {
  if (err) throw err;
  if (doc.type === null) {
    // Create the document if it doesn't exist
    doc.create({ content: "The cat is happy." }, 'ot-text');
  }
});

// Listening for ops from other users
doc.on('op', (op, source) => {
  if (source) return; // Ignore local ops
  console.log('Remote edit received:', op);
});

This snippet shows how ShareDB abstracts the "Arbiter" logic, allowing developers to focus on the UI while the library handles the position transformations and database persistence.

📚 Lessons Learned

  • The Client is a Mini-Server: The client must be smart enough to transform remote ops against its own pending buffer.
  • Sequence Numbers are Life: Without a strict version V -> V+1 flow, you cannot guarantee convergence.
  • Cursors are not Content: Keep cursor/presence data in Redis, not the primary DB. It’s too "noisy" for a persistent log.

📌 Summary

  • OT solves the Position Shift Problem.
  • WebSockets + Redis Pub/Sub are the foundation for real-time scale.
  • Event Sourcing (Ops Log) provides history and durability for free.
  • Snapshots keep document loading fast.

📝 Practice Quiz

  1. Why is Operational Transformation (OT) better than Last-Writer-Wins?

    • A) It makes the database smaller.
    • B) It allows concurrent edits to be merged rather than one being discarded.
    • C) It is easier to implement. Correct Answer: B
  2. What is the primary role of the OT Service "Arbiter"?

    • A) To encrypt the document.
    • B) To assign a unique, sequential version number to every edit and resolve conflicts.
    • C) To render the document to a PDF. Correct Answer: B
  3. In the "Alice inserts, Bob deletes" scenario, why must Bob's delete position be transformed?

    • A) Because the server clock is wrong.
    • B) Because Alice's insert shifted the indices of all subsequent characters.
    • C) Because the delete was too slow. Correct Answer: B
  4. [Open-ended] If you were to add "Offline Mode" to this system, how would you handle a user who makes 1,000 edits while on an airplane and then reconnects? What are the risks of using OT for this?

Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms