System Design HLD Example: Hotel Booking System (Airbnb)
A senior-level HLD for a hotel booking platform handling availability, concurrency, and reservations.
Abstract AlgorithmsMore actions⌄
Reading progress
15 min left
Metadata and pacing⌄
Total read
15 min
Sections
1
◴ On this page⌄
✣ Need another angle?⌄
Switch the article companion into a lower-complexity framing, then quiz yourself when you are ready.
1. Overview
A senior-level HLD for a hotel booking platform handling availability, concurrency, and reservations.
Why it matters
TLDR: A robust hotel booking system must guarantee atomicity in inventory subtraction.
Show high-level concept flow⌄
System Design
Starting point
Hld
Next concept
Hotel Booking
Next concept
Scalability
Next concept
Architecture
Outcome
At a glance
System lens
See System Design HLD Example: Hotel Booking System (Airbnb) as a living topology.
A senior-level HLD for a hotel booking platform handling availability, concurrency, and reservations.
System Design
Ingress and assumptions
Hld
State transition
Hotel Booking
State transition
Scalability
State transition
Architecture
Outcome and guarantees
Narrative transition
Move from explanation to operating judgment.
Use these checkpoints as the conceptual pacing layer before continuing into the full article.
!Why this matters
TLDR: A robust hotel booking system must guarantee atomicity in inventory subtraction.
#Key section to watch
Use the first sections to identify the main mechanism and its constraints.
?Interview angle
Be ready to explain System Design and Hld with one concrete example and one tradeoff.
Tradeoff path 1
System Design: speed-first
TLDR: A robust hotel booking system must guarantee atomicity in inventory subtraction.
Tradeoff path 2
Hld: reliability-first
The core trade off is Consistency vs.
Failure rehearsal
Pressure-test the mental model.
System Design misunderstood
High model quality can still produce incorrect outputs without grounding and verification.
Mitigation: Revisit the first principles and validate assumptions.
Risk 68%
Hld tradeoff missed
Low latency does not automatically mean high throughput under contention.
Mitigation: Document the tradeoff and add an operational check.
Risk 58%
Back to the article
Continue into the authored sections with the topology in mind: each heading should now answer what changes, what can fail, and what guarantee the system is trying to preserve.
TLDR: A robust hotel booking system must guarantee atomicity in inventory subtraction. The core trade-off is Consistency vs. Availability: we prioritize strong consistency for the booking path (PostgreSQL with Optimistic Locking) while allowing eventual consistency and high availability for the search path (Elasticsearch). A two-phase "Hold-then-Confirm" model ensures that inventory isn't leaked during payment failures.
🛑 The New Year's Eve Nightmare
Imagine it’s 11:59 PM on New Year’s Eve. Two different travelers, one in London and one in New York, are looking at the exact same penthouse in Manhattan for the upcoming weekend. Both click "Book Now" at the same millisecond.
In a poorly designed system, the sequence of events looks like this:
- Request A checks the database: "Is the room available?" -> Yes.
- Request B checks the database: "Is the room available?" -> Yes.
- Request A writes a booking record: "Room booked for User A".
- Request B writes a booking record: "Room booked for User B".
Both users receive a confirmation email. Both pay their non-refundable deposits. On Friday, they both show up at the same door with their luggage. This is the Double-Booking Race Condition, and it is the single most important problem a booking system must solve. At scale, "rare" edge cases happen thousands of times a day. If you design for the average case, you fail at the edges.
📖 Global Reservation Systems: Use Cases & Requirements
Actors
- Guest / Traveler: Searches for rooms, views availability, and makes reservations.
- Host / Property Manager: Manages inventory, sets pricing, and views upcoming bookings.
- Admin: Handles disputes, refunds, and platform-wide monitoring.
Functional Requirements
- Search: Users can search rooms by location (geo-coordinates), date range, and guest count.
- Availability: Users see real-time availability for a listing before booking.
- Reservation (Hold): Selecting a room places a temporary 15-minute hold.
- Booking (Confirm): Successful payment converts a hold into a confirmed booking.
- Cancellation: Releasing a booking restores inventory for those specific dates.
Non-Functional Requirements
- Zero Double-Bookings: Strong consistency is non-negotiable for the final booking transaction.
- High Search Availability: Search should remain functional even if the booking database is under heavy load.
- Low Latency: Search results should return in < 200ms; booking confirmation in < 2s.
- Scalability: Handle 100k searches/sec and 500 bookings/sec (peak holiday spikes).
🔍 Basics: Baseline Architecture
At its core, a booking system is an Inventory Management Engine. Unlike a standard e-commerce site where you might have 1,000 units of a SKU, a hotel booking system has "Perishable Inventory." A room night on December 31st is a different "product" than the same room on January 1st.
The baseline architecture involves:
- Inventory Generation: Pre-calculating available slots for every room for the next 365 days.
- The Lock Mechanism: Ensuring that only one user can transition a slot from
availabletobooked. - The Buffer (Hold): Providing a grace period for payment processing so the user doesn't lose the room mid-transaction.
Without these basics, you end up with "Phantom Inventory"—rooms that appear available but are actually locked in failing payment processes.
⚙️ Mechanics: Distribution & Processing Logic
The distribution of inventory must be handled carefully. When a host adds a new listing, we don't just add one row. We must generate 365 rows in the availability_slots table.
- Inventory Fan-out: Every update to a room's base availability (e.g., taking the room offline for maintenance) must propagate to all 365 days.
- Search Synchronization: Since search is handled by Elasticsearch, we use an asynchronous pipeline. A write to the primary DB triggers a Kafka event, which is then indexed into ES. This introduces a 1-2 second lag, which is acceptable for search but not for booking.
- State Machine: Every booking follows a strict state machine:
Available->Held->Booked(or back toAvailableif the hold expires).
📐 Estimations & Design Goals
The Math of Inventory
- Total Listings: 10 Million rooms.
- Booking Window: 1 year (365 days).
- Total Inventory Rows: 10M 365 = *3.65 Billion rows.
- Search-to-Booking Ratio: 20:1. If we have 10k searches/sec, we might have 500 booking attempts/sec.
Design Goal: Decouple the "Read-Heavy" search path from the "Write-Heavy" booking path. We use a Command Query Responsibility Segregation (CQRS) inspired approach where Elasticsearch handles the searches and PostgreSQL handles the ACID transactions.
📊 High-Level Design: Separating Search from Booking
The following architecture ensures that high-volume search traffic never interferes with the critical booking path.
graph TD
User((User)) --> LB[Load Balancer]
LB --> AG[API Gateway]
subgraph Search_Path
AG --> SS[Search Service]
SS --> ES[(Elasticsearch: Geo + Dates)]
SS --> RC[(Search Cache: Redis)]
end
subgraph Booking_Path
AG --> BS[Booking Service]
BS --> AS[Availability Service]
AS --> PDB[(Primary DB: Postgres)]
BS --> PS[Payment Service]
end
subgraph Async_Sync
PDB --> CDC[Debezium / CDC]
CDC --> Kafka[Kafka]
Kafka --> SS
Kafka --> NS[Notification Service]
end
The diagram captures the defining architectural decision: a hard separation between the Search Path (Elasticsearch + Redis) and the Booking Path (Postgres with SELECT FOR UPDATE SKIP LOCKED). The CDC pipeline via Debezium keeps the two paths synchronized without coupling them — a booking written to Postgres propagates to Elasticsearch within 1–2 seconds, keeping search results fresh while ensuring the booking path never touches the search cluster.
🧠 Deep Dive: How Postgres Atomically Prevents the Double-Booking Race Condition
The Hold mechanism is the most critical internal component. Understanding exactly how it works at the database level reveals why no amount of application-level locking can replace it — only the database can guarantee atomicity across concurrent transactions.
Internals: The Hold-then-Confirm State Machine
When a guest selects a room and date range, the Booking Service must atomically transition N rows in the availability_slots table (one per night) from AVAILABLE to HELD. The key word is "atomically": if any single night in the requested range is already HELD or BOOKED by another session, the entire operation must roll back with no writes committed.
This is implemented as a single Postgres transaction using SELECT FOR UPDATE SKIP LOCKED:
| Step | SQL Operation | Why This Mechanism |
| 1. Lock target rows | SELECT … FOR UPDATE SKIP LOCKED | Non-blocking: if rows are locked by another session, returns fewer rows immediately |
| 2. Check completeness | Application checks all N nights returned | Missing row means another session already holds that night |
| 3. Update to HELD | UPDATE slots SET status='HELD', held_until=NOW()+interval '15 min', version=version+1 | Atomic state transition with optimistic version increment |
| 4. Create booking | INSERT INTO bookings (status='HELD') | Booking record created within the same transaction |
| 5. COMMIT | All-or-nothing guarantee | Postgres atomicity ensures no partial holds |
The SKIP LOCKED clause is the key insight. Without it, SELECT FOR UPDATE would block and wait for the competing transaction to release its lock — potentially for seconds. With SKIP LOCKED, if another session has the row locked, the query immediately returns that row as missing. The application then detects the incomplete result and returns "unavailable" to the second guest without any waiting.
| Field | Type | Description |
| slot_id | UUID | Primary key for the availability slot |
| room_id | UUID | FK to rooms table |
| date | DATE | The specific night this slot represents |
| status | ENUM | AVAILABLE, HELD, BOOKED, BLOCKED |
| held_by | UUID | Guest session ID (null when AVAILABLE) |
| held_until | TIMESTAMP | Expiry time for the hold (15-minute TTL) |
| version | INTEGER | Optimistic lock counter |
Performance Analysis: Balancing Search Scale Against Booking Correctness
The CQRS-inspired architecture allows each path to scale completely independently.
| Path | Technology | Peak Throughput | Latency Target |
| Search (geo + date range) | Elasticsearch | 100,000 req/sec | < 200 ms |
| Availability pre-check | Redis bitmap cache | 10,000 req/sec | < 50 ms |
| Hold creation | Postgres SKIP LOCKED | 500 req/sec | < 500 ms |
| Booking confirmation | Postgres + payment gateway | 200 req/sec | < 2,000 ms |
The search path uses Elasticsearch with a geo-point mapping and a date-range filter on a denormalized availability index. Because this index is refreshed asynchronously (Debezium CDC → Kafka → Elasticsearch consumer), there is an intentional 1–2 second lag between a room becoming HELD and that change appearing in search results. This lag is acceptable because the Hold mechanism at the Booking Service provides the ultimate correctness guarantee — a guest who sees a "available" result in search but then gets an "unavailable" response at booking has simply encountered the propagation window. The system remains correct even during this lag.
🌍 Real-World Booking Systems: Airbnb, Booking.com, and Expedia
Airbnb faced the double-booking problem at massive scale as "Instant Book" listings grew. Their solution is a multi-tier availability system: a fast read layer (Redis cache of per-room per-month availability bitmaps) for search, and a strong-consistency write layer (Postgres with row-level locking) for bookings. The Instant Book feature — where a guest can confirm immediately without waiting for host approval — was only possible after Airbnb built a hold mechanism capable of guaranteeing atomic availability from click to confirmation within 2 seconds.
Booking.com uses a date-level inventory system with one row per room per night, exactly as described in this guide. Their data engineering team processes over 1 billion availability updates per day as hotels worldwide manually manage their calendars through the Booking.com extranet. The Kafka pipeline ingesting these updates into Elasticsearch is one of the highest-throughput event streams in European tech infrastructure.
Expedia solved the meta-search aggregation problem differently: rather than holding inventory itself, Expedia passes the hold request directly to the supplier (hotel) API at booking time. This "pass-through" model shifts the hold complexity to the supplier but introduces latency and availability risk from external API calls — a trade-off Expedia accepts in exchange for avoiding the cost of maintaining 3.65 billion inventory rows.
⚖️ Consistency vs. Availability: Trade-offs in the Booking Path
| Design Decision | Advantage | Risk |
| Date-level inventory (one row per night) | Precise partial-week bookings supported | 3.65B rows; requires date-partitioned table and composite index |
| SKIP LOCKED for holds | Non-blocking; competing holds fail fast | Requires robust retry logic in the application layer |
| ES for search, Postgres for booking | Search scales independently to 100k req/sec | 1–2 second search-to-reality propagation lag |
| 15-minute hold TTL | Graceful payment processing window | Popular rooms unavailable during hold if payment fails slowly |
| CQRS read/write separation | Zero cross-path interference | Data synchronization complexity via CDC pipeline |
Critical Failure Mode — The Hold-Expiry and Payment Gap: A guest places a hold, begins payment, and the payment takes 16 minutes (possible with 3D Secure strong authentication). The hold expires at 15 minutes. A background cleanup job reclaims the slot as AVAILABLE. Another guest immediately books the same room. The first guest's payment then succeeds, creating a double booking. Mitigation: The payment confirmation endpoint must re-validate that the hold is still active — with status=HELD and held_until > NOW() — in the same transaction that converts the hold to BOOKED. If the hold has expired, the system must immediately refund and surface an "unable to confirm" message, then re-attempt the hold if inventory is still available.
🧭 Choosing the Right Consistency Model for Your Booking System
Use Postgres with SKIP LOCKED when:
- Inventory has natural row-level granularity (one row per night per room).
- Concurrent booking attempts are moderate (under 1,000 concurrent holds per cluster).
- Strong consistency is non-negotiable because the product being sold has real-world, non-refundable value.
Use Redis distributed locking (Redlock algorithm) when:
- Hold operations span multiple services or databases that cannot participate in a single Postgres transaction.
- Sub-millisecond lock acquisition is required and the Postgres round-trip overhead is prohibitive.
- Inventory granularity is coarser — whole-room availability rather than per-night slots.
When to introduce Elasticsearch for search:
- Total listing count exceeds 500,000 where Postgres full-text and geo queries begin to slow below the 200 ms target.
- Search requires compound filtering: amenities, ratings, geo-polygon boundaries, pet policies.
- Read-to-write ratio for search queries exceeds 50:1 — Elasticsearch's read-optimized index layout provides far superior throughput.
🧪 Delivering This Design in a System Design Interview
Act 1 — The Double-Booking Race Condition (2 minutes): Describe the New Year's Eve scenario from the introduction. Draw two concurrent requests both reading "Available" from the database and both successfully writing a booking record. Show the resulting state: two confirmed guests, one room, two deposit receipts. Grounding the conversation in a concrete failure scenario immediately demonstrates systems thinking.
Act 2 — The CQRS-Inspired Architecture (5 minutes): Divide the whiteboard into a Search Path on the left and a Booking Path on the right. Show that search goes to Elasticsearch and booking goes to Postgres with SKIP LOCKED. Draw the CDC pipeline between them — this is the key architectural insight that allows the two paths to stay synchronized without coupling them. Explain that the 1–2 second lag in search is an intentional and acceptable trade-off.
Act 3 — Scaling and Edge Cases (3 minutes):
| Interviewer Question | Strong Answer |
| How do you scale to 10 million listings? | Date-partitioned availability table in Postgres; Elasticsearch handles geo-search at full scale |
| How do you prevent hold abuse by bots? | Require valid payment method on file before granting a hold; rate-limit holds per user session |
| How does a host cancellation flow work? | Saga pattern: BOOKED → CANCELLED_BY_HOST triggers slot reversion, Kafka refund event, guest notification |
🛠️ Open Source Components for Booking Platform Infrastructure
Debezium is the standard CDC connector used to stream Postgres write-ahead log changes into Kafka. It captures every INSERT, UPDATE, and DELETE from the availability_slots table and publishes them as structured events. The Elasticsearch sync consumer subscribes to these events and updates the search index in near-real-time.
Apache Kafka provides the durable event backbone for the entire async pipeline. The Notification Service and the Elasticsearch sync consumer both consume from the same Kafka topic with independent consumer group offsets, allowing each to process events at its own pace without impacting the other.
PostGIS (Postgres geographic extension) handles the geo-coordinate storage for the listing location. While Elasticsearch handles geo-search at scale, the canonical listing location is stored in Postgres with a PostGIS GEOGRAPHY column and a spatial index for administrative queries.
📚 Lessons Learned From Building and Operating Booking Systems
Lesson 1 — The hold is your correctness anchor. Every architectural decision should be evaluated against one question: "Does this preserve the integrity of the hold?" Adding a caching layer between the Booking Service and Postgres is dangerous if the cached availability can be stale by more than a few milliseconds during the booking transaction.
Lesson 2 — Generate availability rows lazily, not eagerly. Pre-generating 365 rows per room at listing creation time (3.65B rows for 10M listings) is an expensive bulk operation. Generate rows on demand when a search or booking request arrives for a date not yet in the table, and use a background job to pre-warm popular date windows.
Lesson 3 — Monitor the hold abandonment rate. A high hold abandonment rate (guests placing holds and not completing payment) is both a business metric and a system health signal. A sudden spike may indicate that the payment page is slow, that the payment gateway is timing out, or that the hold window is too short for the typical checkout flow.
Lesson 4 — The cancellation refund path is as complex as the booking path. Cancellations must atomically revert availability slots to AVAILABLE, issue a refund via the payment gateway, and notify the host and downstream analytics. Use the Saga pattern for the cancellation flow to ensure each step is idempotent and compensatable if a downstream service is unavailable.
📌 TLDR & Key Takeaways for Hotel Booking System Design
- Core problem: The Double-Booking Race Condition — two concurrent requests both reading "Available" and both writing a booking for the same room and dates.
- Solution:
SELECT FOR UPDATE SKIP LOCKEDin a single Postgres transaction atomically transitions N availability slots from AVAILABLE to HELD in an all-or-nothing operation. - Architecture: CQRS-inspired separation — Elasticsearch for search (100k req/sec), Postgres for booking (500 req/sec), Debezium CDC + Kafka for synchronization.
- Hold model: 15-minute window allows payment processing before the slot is reclaimed by the cleanup job.
- Key trade-off: Eventual consistency in search (1–2 second lag) is acceptable; strong consistency in the booking transaction is non-negotiable.
- At scale: 3.65B availability rows require date-partitioned tables and a composite B-tree index on
(room_id, date, status).
Key takeaways
- ✓TLDR: A robust hotel booking system must guarantee atomicity in inventory subtraction.
- ✓The core trade off is Consistency vs.
- ✓Availability : we prioritize strong consistency for the booking path (PostgreSQL with Optimistic Locking) while allowing eventual consistency and high availability for the search path (Elasticsearch).
- ✓A two phase "Hold then Confirm" model ensures that inventory isn't leaked during payment failures.
Test Your Knowledge
Ready to test what you just learned?
AI will generate 4 questions based on this article's content.
Reader feedback
Was this article useful?
Rate it before you leave, then follow or subscribe for the next deep dive.
Continue learning

Written by
Abstract Algorithms
@abstractalgorithms
Related deep dives

