Home/Blog/System Design/High-Level Design: Scaling a Concert Ticket Booking System under Flash Load
System DesignIntermediateβ€’11 min readβ€’

High-Level Design: Scaling a Concert Ticket Booking System under Flash Load

Design a scalable concert ticket booking system that handles massive traffic surges and prevents double-booking.

Abstract Algorithms

Abstract Algorithms

Helping engineers master software engineering topics.

TLDR: Designing a high-scale ticket booking system requires balancing high read traffic (seat map lookups) with extreme write concurrency (seat lock attempts) during popular concert drops. We achieve this using Redis-based temporary ticket locks and distributed queues.


πŸ“– System Overview & Scale-Based Design Challenge

Imagine the ticketing drops for a global pop star. Within seconds of going live, a stadium with 50,000 seats receives over 1 million concurrent connection requests. If the system is not designed to handle this traffic, database locking bottlenecks occur immediately.

If multiple users attempt to reserve the same seat simultaneously, and the database relies on standard pessimistic transactions, threads stall waiting for locks. The connection pool starves, and the site crashes. Even worse, if locking is not handled correctly, different users might pay for the same seat, leading to double-bookings and operational failures.

The core challenge of a booking system is decoupling the high-traffic seat selection path from the actual payment transactions. This ensures users do not overload the relational database during seat selection, while guaranteeing that payments are processed securely and without conflicts.


πŸ” Core Requirements and Capacity Estimation

To establish a clear design scope, we define our requirements and estimate the system capacity.

Functional Requirements

  • Search & View Event: Users can search for events and view available seats in real-time.
  • Reserve Seats: Users can temporarily hold/lock seats for 10 minutes while they enter payment details.
  • Confirm Booking: Users complete payment, converting the hold into a confirmed booking.
  • Auto-Release Hold: If the 10-minute payment window expires without payment, the seats are made available to other users.

Non-Functional Requirements

  • High Availability: High read availability for event searches and seat map lookups.
  • Strict Consistency: No double-bookings allowed; the seat reservation lock must be atomic.
  • Low Latency: Seat locks must be acknowledged in under 100 milliseconds under load.

Capacity Estimations

Let's calculate the load for a major ticket release:

  • Active Users: 1,000,000 users attempting to buy tickets during the first 5 minutes of a major drop.
  • Search/Read Traffic: Each user refreshes the seat map 3 times. Total search requests = 3,000,000. QPS = 3,000,000 / 300 seconds = 10,000 Read QPS.
  • Reserve/Write Traffic: 100,000 reservation attempts in the first 5 minutes. QPS = 100,000 / 300 seconds = 333 Write QPS.
  • Network Ingress: Seat map data size is 100 KB. Total read data transfer = 10,000 QPS * 100 KB = 1 GB/sec bandwidth.
  • Storage Size: An event has 50,000 seats. Each seat record size is 100 bytes. Active event seat storage size = 5 MB.

βš™οΈ Core Mechanics: API, Schema, and Storage Architecture

Our system uses separate read and write paths to scale.

API Design

We define the core REST contract for the booking flow:

EndpointHTTP MethodDescriptionInput ParametersReturn Format
/api/v1/events/{id}/seatsGETFetch the current available seat mapevent_id, show_timeJSON seat coordinate array
/api/v1/reservationsPOSTTemporarily hold selected seatsevent_id, seat_ids, user_idreservation_id, expires_at
/api/v1/bookingsPOSTComplete purchase and confirm bookingreservation_id, payment_tokenbooking_id, status

Database Schema (Relational Store)

While read maps are cached, the system of record requires a relational database (PostgreSQL/MySQL) to manage consistency:

Table NameColumn NameData TypeKey TypeIndexing Strategy
Eventsevent_idVARCHAR(64)Primary Key-
Eventsname, dateVARCHAR, TIMESTAMP-Index on date
Seatsseat_idVARCHAR(64)Primary Key-
Seatsevent_idVARCHAR(64)Foreign KeyComposite Index (event_id, status)
Seatsrow, numberVARCHAR, INT--
SeatsstatusVARCHAR(16)-Enum: AVAILABLE, HELD, BOOKED
Bookingsbooking_idVARCHAR(64)Primary Key-
Bookingsuser_id, priceVARCHAR, INT--
BookingsstatusVARCHAR(16)-Enum: PENDING, CONFIRMED, FAILED

Cache Schema (Redis Key-Value Design)

To support fast lock resolution during search surges:

Key PatternValue FormatTTLEviction PolicyPurpose
event_map:{event_id}JSON string of coordinates5 secondsvolatile-lruHigh-speed cache for read map lookups
seat_lock:{event_id}:{seat_id}user_id string10 minutesnoevictionDistributed lock for seat reservation

πŸ“Š Architectural Blueprint: High-Level System Flow

The diagram below maps the architecture and components of our booking platform:

graph TD
    Client[User Browser] -->|Seat Map Query| CDN[Cloudflare CDN]
    CDN -->|Cache Miss| API[API Gateway]
    API -->|Read Path| Cache[Redis Cache Cluster]
    API -->|Write Path| LockSvc[Locking Service]
    LockSvc -->|Acquire Lock| RedisLock[Redis Distributed Lock]
    LockSvc -->|Create Temp Hold| RDB[PostgreSQL Primary]
    API -->|Confirm Purchase| PaySvc[Payment Service]
    PaySvc -->|Publish Event| MQ[Apache Kafka]
    MQ -->|Async Update| Worker[Worker Service]
    Worker -->|Finalize Booking| RDB

This system diagram illustrates the architecture of our ticketing platform. Read requests for event seat maps are served directly by the CDN or a Redis cache cluster. Write requests (seat locks) are routed to a dedicated Locking Service that evaluates seat availability using Redis-based distributed locks. Once a seat is locked, the payment is processed asynchronously using Kafka messaging, and the final state is written to the PostgreSQL relational database by a worker service.


🧠 Deep Dive: Solving Concurrency and Double Booking

Managing concurrency at this scale requires decoupling the locking mechanics from the database transactions.

The Internals of Distributed Locks and DB Transactions

To prevent two users from booking the same seat, we use Redis-based Distributed Locks using the Redlock algorithm or simple SETNX operations:

  1. When a user requests a seat lock, the Locking Service executes a SETNX command: SET seat_lock:{event_id}:{seat_id} {user_id} NX PX 600000. This sets the key only if it does not exist, with an expiration time of 10 minutes.
  2. If the command returns success, the user has acquired the lock. The status of the seat in the relational database is updated to HELD using a simple transaction.
  3. If the command fails, the user is notified immediately that the seat is already locked.

This approach keeps database traffic low. We validate seat availability in Redis memory before executing database transactions, protecting the relational store from load spikes.

Performance Analysis of Ticket Locking and Queueing

To handle high payment confirmation traffic, we introduce a message queue (e.g., Apache Kafka). When a user clicks "Buy Now" and enters payment details:

  • The system publishes a PaymentInitiated event to Kafka and returns a Pending status to the client, freeing HTTP threads to handle other requests.
  • A dedicated Payment Service processes the transaction asynchronously, interacting with external gateways.
  • Once payment succeeds, a worker updates the seat status to BOOKED in the database, removes the Redis lock, and sends a confirmation email.

If payment fails or the 10-minute window expires, the Redis key is deleted, and the seat status is reset to AVAILABLE automatically.


πŸ“Š Write and Read Path Sequences

Write Path Flow (Seat Locking)

  1. The client sends a POST /reservations request to the API Gateway.
  2. The Locking Service attempts to acquire a Redis lock for the seat using SETNX.
  3. If successful, the Locking Service updates the seat status to HELD in the database and writes a temporary record.
  4. The system returns a success status with a 10-minute expiration countdown.
  5. If the lock attempt fails, the system returns a conflict error (409) in under 50 milliseconds.

Read Path Flow (Seat Map Fetch)

  1. The client sends a GET /events/{id}/seats request.
  2. The request is intercepted by the CDN. If the seat map cache is warm (less than 5 seconds old), it is returned immediately.
  3. If it is a cache miss, the request hits the Read Service.
  4. The Read Service fetches seat availability from Redis, falls back to the database on a miss, and updates the cache.

🌍 Real-World Implementation: Ticketmaster and Ticketfly

Real-world ticket distributors split their systems into three domains:

  • Queue-It Integration: Waiting rooms that rate-limit incoming users, protecting downstream APIs from traffic spikes during drops.
  • In-Memory Locks: Using caching technologies like Redis or Memcached to handle rapid lock evaluation, ensuring database connections do not exhaust.
  • Payment Handlers: Using asynchronous architectures with Kafka to throttle payment requests, preventing transactional systems from overloading.

βš–οΈ Trade-offs and Failure Modes: Optimistic vs Pessimistic Locking

Selecting a locking strategy involves balancing consistency and system throughput:

StrategyPerformance Under LoadDatabase ImpactError Handling
Optimistic LockingHigh throughput, low latencyLow (no database locks held)High retry rate for users (many conflict updates)
Pessimistic LockingLow throughput (threads block)High (database connection pool starves)Low error rate, but risks system crashes
Redis Distributed LockingHigh throughput, very low latencyExtremely low (validations happen in-memory)Requires managing lock lease renewals

Our design uses Redis distributed locking to achieve high performance while maintaining strict data consistency.


🧭 Decision Guide: Cache-Aside vs Queue-Based Booking

Use this decision table to guide system design choices based on scale and consistency requirements.

SituationRecommendationAlternative
Highly anticipated events with extreme traffic spikesRedis Distributed Locks + Queue-Based PaymentPrevents database overload during flash sales.
Regular event scheduling with low concurrent bookingsStandard Relational Database TransactionsSimpler to implement and maintain.
High read traffic, but low write volumeCDN Caching + Optimistic Database LockingSimple caching without distributed lock overhead.

πŸ§ͺ Practical Interview Execution: 45-Minute Delivery Strategy

When presenting this design in an interview, manage your time using this schedule:

  1. Minutes 0-5 (Clarify Requirements): Establish scale expectations (Active users, QPS) and write functional requirements on the whiteboard.
  2. Minutes 5-15 (High-Level Architecture): Sketch the CDN, Gateway, Read/Write split services, and database layers.
  3. Minutes 15-30 (Deep Dive): Explain how you prevent double-booking using Redis distributed locks. Write out the exact Redis commands and DB tables.
  4. Minutes 30-40 (Asynchronous Payments): Detail the Kafka payment flow, handling edge cases like network timeouts during gateway calls.
  5. Minutes 40-45 (Trade-offs): Summarize the design's trade-offs, discussing optimistic locking and partition strategies for database scaling.

πŸ› οΈ Apache Kafka: Messaging Configuration

In high-concurrency booking systems, Apache Kafka is configured with partition keys set to event_id. This ensures that all transactions for a specific concert drop are processed in order by the same worker instance, preventing write conflicts.

We configure the topic with replication factor 3 and acks=all to guarantee that message commits are persisted across multiple broker instances, protecting the system against broker failures during drops.


πŸ“š Lessons Learned: Production Scaling Pitfalls

Avoid these standard mistakes when deploying booking platforms to production:

  • Setting Long Lock Durations: Keeping seat locks active for too long (e.g., 30 minutes) allows users to tie up inventory without purchasing, frustrating other customers. Keep locks short (10 minutes max).
  • Missing Lock Expiration Handlers: Ensure your lock expiration process automatically resets seat statuses in the database. If the cleanup worker fails, seats can remain locked permanently.
  • Direct Database Seat Queries: Never query the relational database directly to build the seat map interface for customers. Use memory-based caches to prevent database crashes.

πŸ“Œ Summary: High-Scale Booking Cheat Sheet

  • Redis Locks: Use Redis distributed locks for fast, memory-based seat validation.
  • Decoupled Paths: Separate the high-traffic seat selection path from the transactional payment system.
  • Asynchronous Processing: Use message queues to process payments asynchronously, protecting backend systems from load spikes.
  • Short Holds: Limit seat lock durations to 10 minutes to maintain high inventory turnover.
  • Read Caches: Serve seat map read queries from CDNs and caches to protect databases from concurrent traffic.

AI-generated article quiz

Test your understanding

🧠

Ready to test what you just learned?

Generate four focused questions from this article. Answers include immediate explanations.

Guided series path

System Design Interview Prep

View all lessons β†’
Lesson 29 of 72

Reader feedback

Was this article useful?

Rate it if it helped, then continue with the next deep dive when you are ready.

Sign in to save your rating.