All Posts

System Design HLD Example: Proximity Service (Yelp/Google Places)

A practical interview-ready HLD for finding nearby restaurants, hotels, and points of interest.

Abstract AlgorithmsAbstract Algorithms
Β·Β·14 min read
Share
AI Share on X / Twitter
AI Share on LinkedIn
Copy link

TLDR: A proximity service (Yelp/Google Places) solves the 2D search problem by encoding locations into Geohash strings, which are indexed in a standard B-tree. To guarantee results near grid boundaries, the system queries the center cell plus its 8 neighbors and then filters the results using the Haversine formula in the application layer. This architecture provides sub-100ms latency for 200M+ businesses while maintaining the simplicity of a relational database.

πŸ“ The Manhattan Hot-Cell Problem

Imagine you’re in downtown Manhattan. Within a 500-meter radius of your location, there are over 2,000 restaurants, shops, and landmarks. You search for "Coffee," and you expect the results in milliseconds.

If your system uses a standard SQL query like SELECT * FROM businesses WHERE lat BETWEEN x1 AND x2 AND lon BETWEEN y1 AND y2, you're in trouble. Even with indexes on lat and lon separately, the database must perform an "index merge" or scan thousands of rows in memory to find the intersection. At 200 million total businesses, this bounding-box query is slow, and adding a ORDER BY distance filter makes it a CPU-killer.

The core problem is that two-dimensional distance is not B-tree friendly. A B-tree is a one-dimensional sorted structure. To find "nearby" things efficiently, we need a way to map 2D space onto a 1D line while keeping physically close points close to each other on that line. Without a specialized spatial index, your proximity service will either be too slow for users or too expensive for your infrastructure.

πŸ“– Proximity Service: Use Cases & Requirements

Actors & Journeys

  • User: Searches for businesses by category (e.g., "Sushi") or name within a specific radius. They expect a list sorted by distance and rating.
  • Business Owner: Registers their business location, updates operating hours, and uploads photos.
  • Reviewer: Provides feedback and ratings, which are aggregated to influence search ranking.

Functional Requirements

  • Nearby Search: Given a user's (lat, lon) and a radius, return businesses within that circle.
  • Business Management: CRUD operations for business profiles and locations.
  • Ranking: Sort results by a combination of proximity, relevance, and average rating.
  • Filtering: Support for category, price range, and status (e.g., "Open Now").

Non-Functional Requirements (NFRs)

  • Low Latency: Search results must return in $< 100$ ms globally.
  • High Read Availability: Users check for nearby places constantly; downtime is not an option.
  • Scalability: Support 200 million businesses and handle 50,000 search QPS at peak.
  • Eventual Consistency: It is acceptable if a newly added business takes 1-2 minutes to appear in search results.

πŸ” Basics: How Geolocation Maps to Data

At its core, a proximity service must solve the 2D Range Search problem. Traditional databases are optimized for 1D searches (e.g., "find all users with age between 20 and 30"). A B-tree index can handle this with $O(\log N)$ efficiency.

However, geography is two-dimensional (Latitude and Longitude). A simple index on latitude doesn't help with longitude, and vice-versa. If you search for businesses near $(34.05, -118.24)$, a latitude index will return every business in a horizontal strip across the entire globe at that latitude.

The baseline architecture for solving this involves Spatial Indexing. This process transforms 2D coordinates into a 1D value that "preserves locality"β€”meaning points that are close together in the real world stay close together in the database index. Without this mapping, every search would require a full table scan, making the system unusable at scale.

βš™οΈ Mechanics: Turning Coordinates into Searchable Strings

The primary mechanism for spatial indexing is Grid Partitioning. We divide the world into smaller and smaller boxes.

  1. The Grid Approach: Divide the map into $N \times N$ squares. Each square gets a unique ID. To find nearby places, you simply find which square the user is in and query all businesses with that Square ID.
  2. Geohashing: This is a specific type of grid partitioning where the world is recursively divided into 32 quadrants. Each quadrant is assigned a character (0-9, b-z).
    • A 1-character Geohash covers the whole world.
    • A 6-character Geohash covers a small neighborhood ($\approx 1.2 \text{km} \times 0.6 \text{km}$).
    • Crucially: Geohashes share prefixes. All businesses in the same neighborhood will have Geohashes that start with the same 5 or 6 characters. This allows the database to use a simple LIKE '9q8yy%' prefix search on a standard string index.

πŸ“ Estimations & Design Goals

The Math of Global Proximity

  • Total Listings: 200 Million businesses.
  • Average Record Size: $\approx 1$ KB (ID, Name, Lat/Lon, Geohash, Category, Rating, Hours).
  • Metadata Storage: 200M $\times$ 1 KB = 200 GB. This fits easily on a single modern server, but for availability, we'll shard it across a cluster.
  • Search QPS: 50,000 peak.
  • Read-to-Write Ratio: 100:1 (Users search much more than owners update listings).

Design Goals

  • Read-Optimized Indexing: Favor indexing strategies that speed up GET /search at the cost of slightly slower POST /business.
  • Stateless Search: Scale the search service horizontally by keeping no local state.

πŸ“Š High-Level Design: The 9-Cell Expansion

The architecture leverages Geohashing to turn 2D spatial search into a 1D range query.

graph TD
    User -->|lat, lon| AG[API Gateway]
    AG -->|Search| SS[Search Service]
    SS -->|Check Cache| RC[(Redis Search Cache)]
    RC -->|Miss| DB[(Postgres: Geohash Index)]

    Owner -->|Update| BS[Business Service]
    BS -->|Write| DB
    BS -->|Invalidate| RC

    SS -->|Filter & Rank| App[App Layer]
    App --> User

Explanation of the Architecture: The Search Service converts the user's coordinates into a Geohash. To solve the "Boundary Problem" (where a business is 10m away but in a different grid cell), it queries the center cell plus all 8 adjacent cells. It checks Redis for cached results of these 9 cells. On a miss, it queries Postgres using a WHERE geohash LIKE 'prefix%' scan on a B-tree index. The results are then filtered using the Haversine formula in the application layer to calculate exact distances before returning the final ranked list to the User.

πŸ”Œ API Design: The Discovery Contract

The proximity service requires a search-centric API that can handle high volumes of concurrent requests.

EndpointMethodPayloadDescription
/v1/searchGET?lat=37.7&lng=-122.4&radius=2000&cat=coffeeSearch for nearby businesses.
/v1/businesses/{id}GETN/AGet detailed profile of a business.
/v1/businessesPOST{"name": "...", "lat": ..., "lng": ...}Register a new business location.

Example Search Response:

{
  "results": [
    {
      "id": "b123",
      "name": "The Daily Grind",
      "distance_meters": 450,
      "rating": 4.8,
      "is_open": true
    }
  ],
  "next_page_token": "geohash_cursor_456"
}

πŸ—„οΈ Data Model: Schema Definitions

Business Store (PostgreSQL)

We use Postgres because business data is highly structured and requires transactional integrity for ownership and billing.

TableColumnTypeNotes
businessesidUUID (PK)Unique identifier.
businessesnameTEXTBusiness name.
businesseslocationPOINTNative Postgres point for exact coordinates.
businessesgeohashVARCHAR(12)Indexed B-tree column for proximity search.
businessescategoryTEXTIndexed for category filtering.

Search Cache (Redis)

  • Key: search:{geohash_prefix}:{category}
  • Value: JSON list of the top 50 businesses in that cell.
  • TTL: 10 minutes.

πŸ”§ Tech Stack & Design Choices

ComponentChoiceRationale
Primary DatabasePostgreSQL + PostGISBest-in-class spatial indexing and relational stability.
Proximity LogicGeohash (Level 6)Maps a $1.2 \text{km} \times 0.6 \text{km}$ area, perfect for city-level search.
Search CacheRedisOffloads 90% of repeat searches from the primary DB.
Search EngineElasticsearch(Optional) Used for fuzzy-text search ("Cofee" $\rightarrow$ "Coffee").
RankingPython/Java ServiceFlexible for injecting business rules (e.g., promoted listings).

🧠 Design Deep Dive

πŸ›‘οΈ Internals: Geohash vs. Quadtree

There are two primary ways to index spatial data, each with distinct trade-offs:

  1. Geohash (Static Grid): Divides the world into fixed-size buckets. A 6-character geohash like 9q8yyp represents a specific rectangle on Earth.
    • Pros: Simple to implement in any SQL DB; just a string prefix match.
    • Cons: Cell size is fixed. A cell in the Sahara Desert is the same size as one in Times Square, leading to "Hot Cells" in dense cities.
  2. Quadtree (Density-Aware): A tree structure where each node has exactly four children. If a node (geographic area) contains too many businesses, it splits into four smaller quadrants.
    • Pros: Adaptive. It uses small cells for Manhattan and huge cells for the ocean.
    • Cons: Harder to persist in a relational DB; usually lives entirely in RAM.

The Hybrid Choice: For a Yelp-style service with static businesses, Geohash is usually better because business locations don't move. For an Uber-style service with moving drivers, Quadtrees are superior.

πŸ“Š Performance Analysis: The Haversine Filter

Since Geohash cells are rectangular and a search radius is circular, the "9-cell query" will return a "candidate set" that includes businesses outside the true radius (especially near the corners).

  • Algorithm:
    1. Identify the 6-character Geohash of the user.
    2. Query DB for all businesses matching that geohash and its 8 neighbors.
    3. In the application layer, for each candidate, calculate: $dist = \text{haversine}(\text{user_pos}, \text{biz_pos})$.
    4. Discard if $dist > \text{radius}$.
  • Efficiency: Calculating the Haversine distance for 500 candidates in Java/Go takes $< 1$ ms. Doing this inside a SQL query without an index would take seconds.

🌍 Real-World Applications: From Yelp to Uber

Proximity services power some of the most critical apps on our phones today:

  • Local Discovery: Yelp and Google Maps find businesses near you.
  • On-Demand Transport: Uber and Lyft find drivers closest to your pickup location.
  • Hyper-Local Logistics: DoorDash and Instacart find restaurants or stores near your home to optimize delivery speed.
  • Social Networking: Tinder and Bumble use proximity to suggest potential matches nearby.

βš–οΈ Trade-offs & Failure Modes: Balancing Accuracy and Speed

  1. Precision vs. Latency: Using a longer Geohash (e.g., 8 characters) gives you smaller grid cells and more precise candidate sets, but it increases the number of searches needed to cover a large radius.
  2. Boundary Misses: The "center-only" query is the most common failure mode. A business could be 1 meter away from a user but across a Geohash grid line. Without the 9-cell query, this business disappears from the user's view.
  3. Hot Cells: In dense cities like New York, a single Geohash cell might contain 10,000 businesses. This can create "Hot Keys" in your Redis cache. We mitigate this by further partitioning hot cells using a secondary index like category or price range.
  4. Data Staleness: If a business changes its name or rating, it might take several minutes for the cache to invalidate. This is an acceptable trade-off for the 10x performance boost of caching.

πŸ—οΈ Advanced Concepts for Production Evolution

  1. Precision Scaling: If a search in a 6-character geohash (neighborhood) returns zero results, the system should "Zoom Out" by querying the 5-character prefix (city level). This ensures the user never sees a "No Results Found" screen in rural areas.
  2. S2 Geometry: Developed by Google, S2 maps the Earth onto a Hilbert Curve. Unlike Geohash, S2 cells have almost no distortion near the poles and support complex polygon searches (e.g., "Find coffee shops within this specific park's boundaries").
  3. Multi-Region Replication: Business data is replicated globally, but the "Proximity Cache" is regional. A user in Tokyo shouldn't wait for a cache in US-East to tell them where the nearest Ramen shop is.
  4. Ranking by Personalized Relevance: Integrate a machine learning model that boosts businesses the user has visited before or those currently trending in their social circle.

🧭 Decision Guide: Choosing Your Spatial Strategy

SituationRecommendation
Static Points (Yelp)Use Geohash with a relational DB (B-tree). It's simple, stable, and easy to index.
Moving Points (Uber)Use an in-memory Quadtree or Redis Geo. Redis uses Geohashing internally but is optimized for high-write location updates.
Complex Shapes (Park Maps)Use PostGIS with GIST indexes or Google S2. These handle non-circular boundaries better.
Ultra-High DensityUse Elasticsearch for its ability to handle complex filters and spatial queries simultaneously.

πŸ§ͺ Practical Example: Interview Delivery

If asked to design Yelp in 20 minutes, focus on these three beats:

  1. The Spatial Indexing Dilemma: Start by explaining why indexing Latitude and Longitude as separate columns fails at scale.
  2. The 9-Cell Solution: Explain the "Boundary Problem" and how querying adjacent cells ensures you don't miss a business just because it's across an arbitrary grid line.
  3. The Haversine Filter: Explain why the "Expensive Math" happens in the app layer, not the database. It shows you understand performance bottlenecks.

Standard Interview Closer: "I chose a Geohash-based indexing strategy to leverage the reliability of B-tree prefix scans in PostgreSQL. By implementing a mid-tier Redis cache keyed by geohash-cells and performing the final distance filtration in the application service, we achieve sub-50ms latency for a dataset of 200 million businesses while maintaining horizontal scalability."

πŸ› οΈ PostgreSQL PostGIS: How It Works in Practice

In a real-world production environment, you would use PostGIS, a specialized spatial extension for PostgreSQL.

Example: Proximity Search with PostGIS

PostGIS uses a GIST (Generalized Search Tree) index, which acts like an R-tree to group objects into bounding boxes.

-- Find businesses within 2,000 meters of a user point
-- ST_DWithin uses the index to prune candidates efficiently
SELECT name, rating, 
       ST_Distance(geom, ST_MakePoint(-122.41, 37.77)::geography) AS dist_meters
FROM businesses
WHERE ST_DWithin(geom, ST_MakePoint(-122.41, 37.77)::geography, 2000)
ORDER BY dist_meters ASC
LIMIT 15;

In a Java-based Search Service, you would use a library like Hibernate Spatial to generate these queries without writing raw SQL, ensuring your application remains clean and maintainable.

πŸ“š Lessons Learned

  • Static vs. Dynamic: If the objects don't move (restaurants), Geohash is king. If they do move (drivers), use an in-memory Quadtree.
  • The "Sahara" Problem: Always have a fallback for low-density areas; fixed grids are inefficient at the edges of civilization.
  • Don't calculate in SQL: Modern databases are powerful, but calculating trig functions (sine/cosine) for 1M rows will kill your QPS. Use the index to get candidates, then do the math in your code.

πŸ“Œ Summary & Key Takeaways

  • Geohash: The standard for mapping 2D coordinates to 1D strings for efficient indexing.
  • 9-Cell Query: The mandatory expansion to solve the grid boundary problem.
  • Haversine Formula: The mathematical foundation for spherical distance calculation.
  • PostGIS: The most robust tool for production-grade spatial applications.

πŸ“ Practice Quiz

  1. Why is a 1D Geohash better for a database than separate Latitude and Longitude columns?

    • A) It uses less storage space.
    • B) It allows the database to use a single B-tree index to find points that are close in 2D space.
    • C) It is more accurate than floating-point numbers. Correct Answer: B
  2. What is the "Boundary Problem" in Geohashing?

    • A) Geohashes don't work in the ocean.
    • B) Two points can be 1 meter apart but have completely different Geohash strings if they fall on different sides of a grid line.
    • C) Geohashes are too long to fit in a database. Correct Answer: B
  3. In the Proximity Service architecture, where does the most computationally expensive part (Haversine distance) occur?

    • A) Inside the Redis cache.
    • B) In the SQL query's WHERE clause.
    • C) In the Application Layer (Search Service) on a small set of candidates. Correct Answer: C
  4. [Open-ended] How would you handle a "Global Search" (e.g., "Find all Starbucks in the world") versus a "Local Search" (e.g., "Find Starbucks near me")? How does the indexing strategy change?

Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms