System Design HLD Example: Ride-Sharing (Uber/Lyft)
A practical interview-ready HLD for a ride-sharing platform with real-time matching and location tracking.
Abstract AlgorithmsTLDR: A ride-sharing platform is a high-velocity geospatial matching engine. Drivers stream GPS coordinates every 5 seconds into a Redis Geospatial Index. When a rider requests a trip, the Matching Service executes a
GEORADIUSquery to find the 10 closest available drivers. A distributed Trip State Machine ensures transactional consistency (no double-matching), while WebSockets provide the real-time "car on map" experience that users expect.
π The New Year's Eve Meltdown
Imagine itβs New Year's Eve in New York City. At 12:05 AM, one million people open their ride-sharing apps simultaneously. In the background, 50,000 drivers are moving through the city, sending GPS updates every few seconds.
Your matching algorithm is working perfectly, but your database is screaming. Every ride request triggers a massive SQL query: SELECT * FROM drivers WHERE ST_Distance(location, rider_location) < 2000. With 100,000 concurrent requests, the database's CPU hits 100%, lock contention skyrockets, and the "Nearest Driver" query that usually takes 10ms now takes 30 seconds. By the time a driver is "matched," theyβve already moved three blocks away or accepted a ride from a competitor.
This is the Geospatial Hotspot problem. In ride-sharing, the challenge isn't just matchingβit's maintaining a high-velocity, in-memory index of moving objects that can be queried hundreds of times per second without falling behind. At the scale of Uber or Lyft, you aren't just building a database; you are building a real-time reflection of the physical world in silicon.
π Ride-Sharing: Use Cases & Requirements
Actors & Journeys
- Rider: Requests a ride, views ETAs, tracks the driver in real-time, and pays for the trip.
- Driver: Updates their availability (Online/Offline), broadcasts their location, and accepts/rejects trip requests.
- System Admin: Monitors surge pricing, platform health, and fraud detection.
In/Out Scope
- In-Scope: Real-time driver tracking, geospatial matching, trip lifecycle management, and live position streaming.
- Out-of-Scope: Advanced driver background checks (manual workflows), automated vehicle maintenance scheduling, and complex airport queueing logic.
Functional Requirements
- Real-time Location Stream: Drivers must broadcast GPS coordinates every 5 seconds.
- Geospatial Matching: Efficiently find available drivers within a specific radius (e.g., 2km) of a rider.
- Trip Management: Manage the full lifecycle of a ride (Request $\rightarrow$ Match $\rightarrow$ Pickup $\rightarrow$ Dropoff).
- Live Tracking: Stream the driver's current position to the rider's map with low latency.
Non-Functional Requirements (NFRs)
- Ultra-Low Matching Latency: A rider should see a "Driver Found" confirmation in $< 2$ seconds.
- High Write Availability: The system must handle 100k+ GPS updates per second without dropping pings.
- Strict Consistency for Matching: One driver must never be matched to two riders simultaneously (Double Booking prevention).
- Scalability: The architecture must handle city-wide surges (e.g., New Year's Eve) without cascading failures.
π Foundations: How Geospatial Matching Works
At its core, a ride-sharing app is a high-frequency matching market.
The baseline architecture relies on three concepts:
- Geospatial Indexing: Traditional B-Tree indexes cannot efficiently handle 2D range queries. We use Geohashing or Quadtrees to convert 2D coordinates into a 1D searchable space.
- The Ping-Pong Protocol: Drivers "ping" the server with their location; the server "pongs" back with current demand or nearby rider requests.
- The Buffer Zone: We don't just match the "closest" driver; we match the "best" driver, which might consider ETA, driver rating, or heading.
Without these foundations, the system would require constant full-table scans of every driver in the world for every rider requestβa task that is computationally impossible at scale.
βοΈ The Mechanics of Geohashing
The mechanism of spatial search is powered by Geohashing.
- Geohash Encoding: A Geohash is a hierarchical spatial data structure that subdivides the world into a grid of cells. Each cell is assigned a unique alphanumeric string.
- The Prefix Property: Two points that are close together in the physical world will usually have Geohashes with a common prefix. For example,
9q8yyand9q8yzare both in San Francisco. - Radius Search: By finding the rider's Geohash and its 8 neighboring cells, we can perform a simple range query in a Sorted Set (like Redis) to find all drivers in the vicinity. This turns a $O(N^2)$ geometry problem into a $O(\log N)$ string comparison problem.
π Estimations & Design Goals
The Math of Real-Time Geospatial
- Total Drivers: 1,000,000 globally.
- Active Drivers (Peak): 500,000.
- Location Update Frequency: Every 5 seconds.
- Location Write QPS: $500,000 / 5 = \mathbf{100,000 \text{ writes/sec}}$.
- Ride Request QPS (Peak): $\approx \mathbf{500 \text{ requests/sec}}$.
- In-Memory Index Size: 500k drivers $\times$ 100 bytes (ID + Lat + Lng + Metadata) $\approx$ 50 MB. This is tiny enough for a single Redis node but needs sharding for QPS.
Design Goals
- In-Memory Indexing: Use Redis Sorted Sets for $O(\log N)$ spatial search.
- Decoupled Write Path: GPS updates are buffered through a high-throughput Location Service.
- Double-Booking Prevention: Use distributed locking for atomic matching.
π High-Level Design: The Real-Time Matching Flow
The architecture separates the high-frequency location stream from the transactional trip management logic.
graph TD
Driver -->|GPS 5s| LS[Location Service]
LS -->|GEOADD| Redis[(Redis Geo Index)]
Rider -->|Request Ride| MS[Matching Service]
MS -->|GEORADIUS| Redis
MS -->|Lock Driver| TS[Trip Service]
TS --> DB[(Trip DB: Postgres)]
TS -->|Match Event| NS[Notification Service]
NS -->|Push| Driver
NS -->|Push| Rider
LS -->|Stream| WS[WebSocket Gateway]
WS -->|Live Update| Rider
Explanation of the Architecture: The architecture utilizes a Write-Heavy Location ingestion path. The Location Service receives 5-second GPS pings from drivers and updates an in-memory Redis Geospatial Index. When a rider requests a trip, the Matching Service performs a radius search on Redis. To ensure a driver is only matched once, the Trip Service handles the transactional "handshake" using Postgres. Real-time visual updates are provided by a WebSocket Gateway that streams driver positions directly to the rider's map, bypassing the database entirely for coordinates.
π API Design: The Service Contract
The communication between the mobile apps and the backend requires efficient, lightweight endpoints.
| Endpoint | Method | Payload | Description |
/v1/locations | POST | {"lat": 37.7, "lng": -122.4} | Driver sends GPS ping. Highly optimized for speed. |
/v1/trips | POST | {"rider_id": "r1", "pickup": "..."} | Rider requests a ride. Triggers matching. |
/v1/trips/{id}/accept | POST | {"driver_id": "d1"} | Driver accepts the matched request. |
/v1/trips/{id}/status | GET | N/A | Polling fallback for trip state updates. |
ποΈ Data Model: Schema Definitions
Trip Store (PostgreSQL)
We use a relational database for trips because we need strong ACID guarantees for billing and history.
| Table | Column | Type | Notes |
trips | id | UUID (PK) | Unique trip identifier. |
trips | rider_id | UUID | Indexed for rider history. |
trips | driver_id | UUID | Indexed for driver payouts. |
trips | status | ENUM | REQUESTED, MATCHED, IN_PROGRESS, COMPLETED. |
trips | fare | DECIMAL | Calculated final price. |
Active Driver Index (Redis)
This is not a traditional table but a Geospatial Sorted Set.
- Key:
drivers:active:{city_id} - Member:
driver_id - Score: 52-bit Geohash (managed by Redis
GEOADD).
π§ Tech Stack & Design Choices
| Component | Choice | Rationale |
| Location Store | Redis (Geo Commands) | Native support for radius queries; extreme write throughput. |
| Transactional DB | PostgreSQL | Reliability and complex relationship support for payments. |
| Real-time Comms | WebSockets (Netty) | Bi-directional low-latency map updates. |
| Messaging | Apache Kafka | Decouples location updates from analytics and logging. |
| Distributed Lock | Redlock (Redis) | Essential for atomic matching in a distributed environment. |
π§ Design Deep Dive
π‘οΈ Internals: The Trip State Machine
Consistency is maintained by a strict State Machine. A trip cannot skip states; for example, a trip cannot go from REQUESTED to IN_PROGRESS without being MATCHED.
- The Request: Rider creates a trip record.
- The Match: Matching Service finds Driver A. It attempts to acquire a Distributed Lock in Redis:
SET driver:lock:A trip_123 NX EX 15. - The Acceptance: If the lock is acquired, Driver A gets 15 seconds to accept. If they accept, the state moves to
MATCHEDand the driver is removed from the "Available" Redis set. - The Pickup: Driver clicks "Start Trip". State moves to
IN_PROGRESS.
This ensures that even if two Matching Service nodes try to match Driver A at the exact same millisecond, only one will succeed.
π Performance Analysis: Geohashing and Radius Search
Redis performs spatial search by mapping Geohash cells to a 1D Sorted Set.
- Complexity:
GEORADIUSis $O(N + \log M)$. For a city with 20,000 drivers, the query takes $< 5$ ms. - Bottleneck: The real bottleneck is the network stack. 100k GPS pings/sec requires a massive fleet of Location Service nodes and optimized TCP/UDP handling.
- SLO: Driver discovery time $< 1s$. GPS-to-Map latency $< 2s$.
π Real-World Applications
Ride-sharing architecture is a template for the "On-Demand Economy":
- Food Delivery: DoorDash and UberEats use similar matching logic for couriers.
- Logistics: Trucking companies use real-time tracking for fleet management.
- Micro-mobility: Bird and Lime use geospatial indexing to show available scooters to users.
- Local Services: TaskRabbit or Handy matching service providers to customers.
βοΈ Trade-offs & Failure Modes
- Availability vs. Consistency: We trade off a tiny bit of availability (locking a driver for 15 seconds) to ensure strict consistency (no double booking).
- Ping Frequency vs. Battery: We could ping every 1 second for perfect accuracy, but that would kill driver phone batteries. We trade off 5-second intervals for battery health.
- Failure Mode: Hot Keys. A massive event in a small area (e.g., Coachella) can overwhelm a single Redis shard. We use Geohash Sharding to distribute load.
- Failure Mode: Dead Reckoning. If a driver enters a tunnel, their GPS stops. We use Kalman Filtering on the client-side to "predict" the driver's movement until the next signal.
ποΈ Advanced Concepts for Production Evolution
- Geohash Sharding: We shard Redis by Geohash prefixes (e.g., all
9q8...go to Shard A). If a rider is on a border, the Matching Service queries neighbors in parallel. - Surge Pricing Engine: Monitors the "Supply/Demand" ratio. If 1,000 riders are in a cell with 10 drivers, the multiplier increases.
- Map Matching: Raw GPS is jumpy. We use a service to snap coordinates to valid road segments.
- WebSocket Load Balancing: Maintaining 1M+ concurrent WebSockets requires specialized gateways that can handle connection draining during deployments.
π§ Decision Guide
| Situation | Recommendation |
| Low QPS (<100/sec) | Use Postgres with PostGIS. It's simpler and more robust. |
| High QPS (>1k/sec) | Use Redis Geospatial or Quadtrees in memory. |
| Global Scale | Shard your in-memory index by City or Geohash prefix. |
| Accuracy Critical | Use Map Matching and HMM algorithms to smooth GPS noise. |
π§ͺ Practical Example: Interview Delivery
If asked to design Uber in 20 minutes, focus on these three beats:
- The Write-Volatile Location Store: Explain why GPS updates go to Redis, not Postgres. Use the "100k writes/sec" math.
- The Matching Race Condition: Explain the "Double Booking" problem and the Redis
SET NXlock solution. - Geospatial Efficiency: Explain how Geohashing turns a 2D search into a 1D range query.
Standard Interview Closer: "I chose a Redis-backed geospatial architecture to handle the extreme velocity of location updates while ensuring sub-second matching latency. By decoupling the real-time tracking via WebSockets and using a transactional Postgres store for the trip lifecycle, we achieve the perfect balance between real-time performance and financial reliability."
π οΈ Redis Geospatial: How It Works in Practice
Redis is the de-facto standard for this problem.
Update Driver Location (Java Snippet)
// Using Jedis to update driver location in a city-specific set
public void updateDriverLocation(String cityId, String driverId, double lat, double lng) {
// GEOADD key longitude latitude member
jedis.geoadd("drivers:active:" + cityId, lng, lat, driverId);
// Set TTL on the driver's availability
jedis.expire("drivers:active:" + cityId, 60); // Offline if no ping for 60s
}
Find Nearest Drivers (CLI Example)
# Find 10 drivers within 2km of the rider's lat/lng in San Francisco
GEORADIUS drivers:active:sf -122.41 37.77 2 km WITHDIST ASC COUNT 10
Transactional Matching (Redlock)
RLock lock = redisson.getLock("driver_lock:" + driverId);
// Try to acquire lock for 10s, release after 15s (matching window)
if (lock.tryLock(10, 15, TimeUnit.SECONDS)) {
try {
// Create trip in Postgres and notify driver via push
} finally {
lock.unlock();
}
}
π Lessons Learned
- Architecture is about Frequency: High-frequency data (GPS) belongs in RAM; low-frequency data (Billing) belongs on Disk.
- Maps are Noisy: Never show raw GPS to a user; always smooth it or snap it to the road.
- Fail Fast: If a driver doesn't accept a match in 15 seconds, release the lock and find the next candidate immediately to keep the rider happy.
π Summary & Key Takeaways
- Redis Geospatial: The industry standard for high-velocity spatial indexing.
- Geohashing: Converts a 2D coordinate into a 1D search problem.
- Trip State Machine: The "Source of Truth" that prevents illegal trip transitions.
- WebSocket Gateway: The bridge that enables the "Real-time" feel of the application.
π Practice Quiz
Why is Geohashing better than a simple radius search in a relational database?
- A) It is more accurate.
- B) It allows $O(\log N)$ searches by converting 2D coordinates into 1D strings/integers.
- C) It is easier to code. Correct Answer: B
What is the purpose of the Redis
SET NXlock in the matching process?- A) To encrypt the driver's data.
- B) To ensure a driver isn't matched to two riders simultaneously.
- C) To speed up the database. Correct Answer: B
What is "Map Matching"?
- A) Finding the best route for a driver.
- B) Snapping jumpy GPS coordinates to the nearest road on a map.
- C) Comparing two maps for differences. Correct Answer: B
[Open-ended] How would you handle a driver who loses their internet connection mid-trip? Discuss the impact on the State Machine and the User Experience.
π Related Posts

Written by
Abstract Algorithms
@abstractalgorithms
More Posts

Adapting to Virtual Threads for Spring Developers
TLDR: Platform threads (one OS thread per request) max out at a few hundred concurrent I/O-bound requests. Virtual threads (JDK 21+) allow millions β with zero I/O-blocking cost. Spring Boot 3.2 enables them with a single property. Avoid synchronized...

Java 8 to Java 25: How Java Evolved from Boilerplate to a Modern Language
TLDR: Java went from the most verbose mainstream language to one of the most expressive. Lambdas killed anonymous inner classes. Records killed POJOs. Virtual threads killed thread pools for I/O work.
Data Anomalies in Distributed Systems: Split Brain, Clock Skew, Stale Reads, and More
TLDR: Distributed systems produce anomalies not because the code is buggy β but because physics makes it impossible to be perfectly consistent, available, and partition-tolerant simultaneously. Split brain, stale reads, clock skew, causality violatio...
Sharding Approaches in SQL and NoSQL: Range, Hash, and Directory-Based Strategies Compared
TLDR: Sharding splits your database across multiple physical nodes so no single machine carries all the data or absorbs all the writes. The strategy you choose β range, hash, consistent hashing, or directory β determines whether range queries stay ch...
