All Posts

System Design HLD Example: News Feed (Home Timeline)

Interview-focused HLD for a scalable social feed with fan-out and ranking trade-offs.

Abstract AlgorithmsAbstract Algorithms
Β·Β·18 min read

AI-assisted content.

TLDR: A news feed system builds personalized timelines by combining content publishing, graph relationships, and ranking. The scalability crux is the fan-out amplified write path: a single celebrity post can trigger 100M writes. A hybrid fan-out strategyβ€”push for normal users, pull for celebritiesβ€”is the industry standard for 99.99% availability.

πŸ“– The Katy Perry Problem

Imagine it’s 2013. Twitter is growing at a breakneck pace. Most users have a few hundred followers, and the system handles them easily by "pushing" new tweets into their home timelines at write-time. Then, Katy Perry posts.

She has 100 million followers. If the system sticks to its standard "push" model, a single tweet triggers 100 million database writes simultaneously. The message queues backup, the database primary hits 100% CPU, and for the next three hours, nobody on Twitter can see new posts.

This is the Write Amplification trap. In a news feed, the challenge isn't just storing dataβ€”it's the massive disparity between a single post and its global consumption. If you design for the average user, you fail at the edges. If you design for the edge cases, you might over-engineer the core.

πŸ“– News Feed: Use Cases, Actors, and Scale Requirements

Actors

  • Publisher / Author: Creates and posts content (text, media).
  • Reader / Follower: Consumes a ranked home feed of followed users' posts.
  • System: Handles fan-out, ranking, and timeline materialization.

Functional Requirements

  • Post Creation: Users can POST content with text/media.
  • Timeline Read: Users see a chronological or ranked feed of people they follow.
  • Follow Graph: Users can follow/unfollow others, immediately affecting their feed.
  • Feed Freshness: New posts should appear in follower feeds within 5 seconds.

Non-Functional Requirements

  • High Read Availability: 99.99% (Users check feeds constantly).
  • Low Latency: Timeline reads should be < 100ms.
  • Scalability: Handle 100k post creates/sec and 1M reads/sec.
  • Eventual Consistency: A 5-second delay in post visibility is acceptable to ensure write availability.

πŸ” Basics of News Feed Architecture

At its heart, a news feed is a many-to-many relationship pipeline. Unlike a simple blog where one post is read by everyone, a feed is a personalized "inbox" for every user.

The baseline architecture involves three main steps:

  1. Ingestion: Taking the author's post and making it durable.
  2. Fan-out: Distributing that post to every follower's list.
  3. Hydration: Combining the list of post IDs with the actual content (text, images) to show to the user.

Without these basics, the system would require a massive SQL JOIN between the Posts and Follows tables for every single user refreshβ€”a process that would collapse under the load of even a small social network.

βš™οΈ Core Mechanics: Push vs. Pull

The mechanism of distribution is called Fan-out.

  • Fan-out on Write (Push): When a post is created, we immediately write it into the pre-computed timelines of all followers. Pros: Reads are incredibly fast. Cons: Writes are expensive if the author has millions of followers.
  • Fan-out on Read (Pull): We don't do anything at write-time. When a user requests their feed, we pull the most recent posts from everyone they follow and sort them on the fly. Pros: No write amplification. Cons: Reads are very slow.

Modern systems use a Hybrid Mechanic to get the best of both worlds.

πŸ“ Estimations & Design Goals

The Math of Fan-out

  • Daily Active Users (DAU): 500 Million.
  • Average Follower Count: 200.
  • Post Volume: 500M posts/day (~5,800 writes/sec).
  • Read Volume: 50B views/day (~580k reads/sec).
  • Write Amplification: 5,800 posts/sec 200 followers = *1.16M timeline writes/sec.

Key Goal: Decouple the "Write to Post Store" from the "Fan-out to Timelines." The user should get a "Success" response as soon as the post is durable, even if the fan-out takes a few more seconds.

πŸ“Š High-Level Design: The Hybrid Fan-out Architecture

The following diagram illustrates the separation between the synchronous write path and the asynchronous materialization pipeline.

graph TD
    User((User)) --> LB[Load Balancer]
    LB --> AG[API Gateway]

    subgraph Write_Path
        AG --> PS[Post Service]
        PS --> PDB[(Post DB: Postgres)]
        PS --> MQ[Message Queue: Kafka]
    end

    subgraph FanOut_Pipeline
        MQ --> FW[Fan-out Workers]
        FW --> GS[Graph Service]
        FW --> RC[(Timeline Cache: Redis)]
    end

    subgraph Read_Path
        AG --> TS[Timeline Service]
        TS --> RC
        TS -.->|Fallback| PDB
    end

The diagram maps the full lifecycle of a post from creation to timeline display. On the Write Path, the Post Service makes the post durable in Postgres and immediately emits a PostCreated event to Kafka β€” the user sees a "Success" response without waiting for fan-out to complete. On the Fan-out Pipeline, workers consume from Kafka, query the Graph Service for the author's follower list, and write post IDs into each follower's pre-computed timeline in Redis as a sorted set ordered by timestamp. On the Read Path, the Timeline Service retrieves pre-computed post IDs from Redis and hydrates them with content from the Post Store β€” delivering sub-100ms feed loads regardless of how many follows a user has.

🧠 Deep Dive: The Hybrid Fan-out Strategy and Its Data Structures

Internals: How the Fan-out Worker Uses the Social Graph to Route Timeline Writes

When a PostCreated event arrives on the posts.created Kafka topic, the Fan-out Worker executes a precise sequence. First, it reads the author_id from the event payload and looks up the celebrity flag in Redis: GET celebrity:{author_id}. If the flag is set, the worker writes nothing to timelines β€” the post will be served via pull fan-out at read time. If the flag is absent, the worker fetches the author's follower list from the Graph Service.

The follower list retrieval is the I/O bottleneck in the fan-out pipeline. For an author with 500,000 followers, the Graph Service returns the list as a paginated stream of follower user IDs, sorted by last-active timestamp descending. The Fan-out Worker processes followers in priority order: it writes to the timelines of the 50,000 most recently active followers first (immediate priority), then processes the remaining followers asynchronously. This active-follower-first approach ensures that the users most likely to open the app immediately after a post is published see it in their feed within seconds, even if the tail of inactive followers receives the update minutes later.

For each follower, the worker executes ZADD timeline:{follower_user_id} {post_timestamp_ms} {post_id} followed by ZREMRANGEBYRANK timeline:{follower_user_id} 0 -801 to trim the sorted set to the 800 most recent entries. Both Redis commands are pipelined in batches of 1,000 followers, reducing the number of Redis round-trips from O(followers) to O(followers/1000).

Performance Analysis: Write Amplification Math and Redis Memory Budget

The fan-out pipeline's performance characteristics are dominated by the write amplification factor:

MetricValueCalculation
Post creation rate5,800 posts/sec500M DAU Γ— 0.001 posts/day Γ· 86,400
Average follower count200Platform average (not celebrity-skewed)
Fan-out ZADD operations/sec1.16M/sec5,800 Γ— 200
Celebrity posts (skip fan-out)~0.1% of postsAuthors above threshold
Peak fan-out (viral event, 1M follower author)1M writes in < 60 secRequires 200+ parallel worker shards
Redis memory per user timeline~32 KB800 entries Γ— 40 bytes (score + post_id)
Total Redis memory for 500M users~16 TBRequires Redis Cluster with tiered eviction

The 16 TB figure reveals why production systems do not store every user's timeline in Redis. Instead, they store only active users β€” users who have been active within the last 30 days. For a platform with 500M total users but 150M monthly active users, the Redis timeline storage requirement drops to approximately 4.8 TB, manageable with a 10-node Redis Cluster at 512 GB RAM per node. Inactive users' timelines are rebuilt from Postgres on their next app open.

The core architectural challenge in a news feed is write amplification. When Katy Perry (100 million followers) posts a tweet, a naive push fan-out generates 100 million writes to follower timelines simultaneously. This is the Celebrity Problem β€” and it is why every production feed system uses a hybrid fan-out strategy rather than a pure push or pure pull model.

The Three Fan-out Models Compared

ModelWrite-Time WorkRead-Time WorkBest For
Fan-out on Write (Push)Write post ID into every follower's timeline Redis sorted set immediatelyRead pre-computed timeline β€” O(1) Redis lookupAuthors with small to medium follower counts (< 5,000 followers)
Fan-out on Read (Pull)Write only to Post Store β€” no fan-outFetch recent posts from all followed authors and merge β€” O(follows) DB queriesCelebrity accounts with millions of followers
Hybrid (Production Standard)Push to timelines of normal users only; skip celebritiesMerge pre-computed timeline with on-demand celebrity posts at read timeAll production social feeds β€” optimal for both tails

The threshold for "celebrity" varies by platform: Twitter used approximately 1 million followers, Instagram uses a similar threshold. Accounts below the threshold get full push fan-out; accounts above get pull fan-out at read time.

Timeline Data Structure in Redis

Each user's pre-computed timeline is stored as a Redis Sorted Set (ZADD) keyed by user ID. The score is the post's publish timestamp (Unix milliseconds), which gives natural chronological ordering and supports efficient range queries.

Redis KeyStructureScoreMemberTTL
timeline:{user_id}Sorted SetPost timestamp (Unix ms){post_id}7 days
post:{post_id}Hashβ€”{author_id, body, media_url, like_count}30 days
follow:{user_id}:countIntegerβ€”Follower countNo TTL
celebrity:{user_id}Boolean flagβ€”1 if above thresholdNo TTL

The timeline sorted set is capped at the most recent 800 post IDs per user. When a fan-out worker writes post ID 801, it trims the oldest entry (ZREMRANGEBYRANK timeline:{user_id} 0 0). This bounds memory usage regardless of how long a user stays away and how many people they follow.

Post Store Data Model

ColumnTypeConstraintPurpose
post_idUUIDPRIMARY KEYUnique post identifier
author_idUUIDNOT NULL, FK β†’ usersAuthor reference
bodyTEXTNOT NULL, max 2KBPost text content
media_urlsTEXT[]nullableArray of image/video CDN URLs
post_typeENUMNOT NULLtext / image / video / share
created_atTIMESTAMPTZDEFAULT NOW()Post timestamp; used as timeline score
like_countBIGINTDEFAULT 0Denormalized counter; updated via async aggregation
comment_countBIGINTDEFAULT 0Denormalized counter
is_deletedBOOLEANDEFAULT FALSESoft-delete for content moderation
visibilityENUMDEFAULT publicpublic / followers_only / private

The like_count and comment_count columns are intentionally denormalized. Computing them via COUNT(*) joins on every timeline hydration would be prohibitively expensive. They are updated by a dedicated aggregation service that batches like/comment events from Kafka and periodically flushes counts to Postgres.

Fan-out Worker Decision Flow

graph TD
    A[Fan-out Worker reads PostCreated event from Kafka] --> B{Is author a celebrity?}
    B -->|No β€” followers below threshold| C[Fetch all followers from Graph Service]
    C --> D[Write post_id to each follower timeline in Redis ZADD]
    D --> E[Trim timeline to 800 entries ZREMRANGEBYRANK]
    B -->|Yes β€” followers above threshold| F[Skip push fan-out entirely]
    F --> G[Post available for pull fan-out at read time]
    E --> H[Fan-out complete β€” post visible in follower feeds]
    G --> H

The decision flow shows exactly where the hybrid threshold determines whether a post is pushed or pulled. For non-celebrity authors, the worker writes the post ID directly into every follower's timeline sorted set and trims the oldest entry to maintain the 800-post cap. For celebrity authors, the worker writes nothing to follower timelines β€” the post is retrieved on-demand at read time from the Post Store and merged with the pre-computed timeline.

🌍 Real-World Applications: How Twitter, Instagram, and LinkedIn Handle Feed Fan-out

Twitter pioneered the hybrid fan-out approach that the entire industry now follows. Twitter's 2013 architecture blog post introduced the concept of separating "normal user" push fan-out from "celebrity" pull fan-out using a follower-count threshold. Twitter's Graph Service (backed by FlockDB, a distributed adjacency list database) can return the 100K most active followers of an author in under 100ms β€” enabling targeted push fan-out to the most engaged followers even for large accounts. The less-active long tail of followers receives eventual fan-out as capacity allows.

Instagram handles 100 million posts per day with a fan-out system built on Apache Kafka and a custom Redis sharding layer. Instagram's key innovation is the ranked feed: rather than a purely chronological timeline, posts are ranked by a machine learning model that scores each post for relevance to the specific viewer. This ranking computation happens at read time for each user's feed request, combining the pre-computed timeline of post IDs with real-time engagement signals. Instagram found that ranked feeds increased per-user session length by over 40% compared to chronological feeds.

LinkedIn operates a feed system with unique constraints: professional content has a much longer relevance window than social content. A job posting or professional article is still relevant 7 days after publication β€” unlike a tweet that is stale within hours. LinkedIn's feed system extends the timeline TTL to 30 days (versus Twitter's 7 days) and weighs engagement recency signals more heavily to surface still-relevant older content alongside fresh posts.

βš–οΈ Trade-offs and Failure Modes in News Feed Architecture

Write Amplification at Celebrity Scale

The most cited failure mode in news feed systems is write amplification from celebrity posts. A single post by an author with 50 million followers triggers 50 million Redis ZADD operations. At 10ms per ZADD (including network I/O), 50 million operations would take 500,000 server-seconds β€” clearly impossible in real time. Production systems handle this by:

  1. Async fan-out with backpressure: Kafka allows fan-out workers to process at their own pace. Follower timelines for less-active users may lag by minutes during a celebrity spike β€” an acceptable trade-off given the eventual-consistency SLA.
  2. Parallel worker shards: Fan-out workers are sharded by follower ID range, so the 50 million followers are processed in parallel across hundreds of worker instances.
  3. Active follower prioritization: Only the most recently active followers (e.g., active in the last 24 hours) receive immediate push fan-out. Inactive followers' timelines are populated lazily when they next open the app.

Timeline Cache Miss on New User or Long-Absence Return

When a user returns to the app after a long absence, their Redis timeline sorted set may have expired (TTL elapsed) or been evicted under memory pressure. The Timeline Service must handle this gracefully: fall back to the Postgres Post Store and rebuild the timeline by joining recent posts from all followed authors. This is an expensive query β€” O(follows Γ— posts per author) β€” that should be served from a read replica and cached aggressively after the first rebuild.

Graph Service as a Single Point of Failure

The Fan-out Worker depends on the Graph Service to look up follower lists. If the Graph Service is slow or unavailable, fan-out workers stall and timelines become stale. Mitigation: cache follower lists in a local cache with a 5-minute TTL on each Fan-out Worker instance. On Graph Service failure, workers use the stale cached list β€” accepting up to 5 minutes of follower-list staleness rather than stopping fan-out entirely.

🧭 Decision Guide: Fan-out Strategy Selection for Your Scale

ScenarioRecommended StrategyRationale
Small social app, < 1M users, max 10K followers per authorPure Fan-out on Write (Push)Simple implementation; write amplification is manageable
Mid-scale platform, max 500K followers, > 10M DAUHybrid (push for < 10K followers, pull for celebrities)Celebrity problem becomes significant above this threshold
Large-scale platform with influencers, > 100M DAUHybrid with active-follower prioritizationOnly push to recently active followers to reduce amplification
Ranked feed (ML-scored, not chronological)Fan-out on Write for IDs + read-time ML rankingPre-computing rankings at write time is impractical; score at read time
Real-time feeds (< 1 second freshness required)Fan-out on Write only, no celebrity exemptionPull fan-out at read time adds latency that violates real-time SLA
Long content lifespan (articles, jobs)Fan-out on Write + extended Redis TTL (30 days)Content remains relevant longer; timeline expiry must match content lifecycle

πŸ§ͺ Interview Delivery Example: Walking Through a News Feed in 45 Minutes

Minute 1–5: Requirements scoping. Ask: "Is the feed chronological or ranked by relevance? What is the maximum acceptable staleness for new posts appearing in follower feeds? What is the expected follower count distribution β€” are there celebrity accounts?" These questions signal that you understand the architectural implications of fan-out strategy selection.

Minute 6–15: Write path. Establish the decoupling pattern: "When an author posts, the Post Service writes to Postgres and publishes a PostCreated event to Kafka immediately β€” the author receives their success response without waiting for fan-out. Fan-out is asynchronous and can lag by seconds without affecting the author's experience."

Minute 16–30: Fan-out strategy. Introduce the celebrity problem before the interviewer can ask: "If every author gets push fan-out, a single post by an account with 50 million followers generates 50 million Redis writes. This is the write amplification problem. The industry-standard solution is a hybrid model: push fan-out for authors with fewer than N followers, and pull fan-out at read time for celebrities. The Timeline Service merges both at query time."

Minute 31–40: Data model and read path. Present the Redis sorted set structure for timeline storage. Explain the 800-post cap and why it bounds memory. Walk through the hydration step: "The timeline contains only post IDs. The Timeline Service fetches the actual post content from the Post Store in a parallel batch fetch, then returns the merged list to the client."

Minute 41–45: Failure modes. Address three scenarios: (1) celebrity post causing write spike β€” answer with Kafka backpressure and active-follower prioritization; (2) timeline cache miss on user return β€” answer with Post Store fallback and rebuild; (3) Graph Service failure during fan-out β€” answer with local follower list cache and staleness tolerance.

πŸ› οΈ Redis, Kafka, and the Graph Store: How Production Feed Systems Are Built

Redis Cluster stores the pre-computed timelines as sorted sets. In production, user IDs are hashed across Redis nodes, distributing timeline storage evenly. A timeline sorted set consumes approximately 40 bytes per post ID entry (score + member). The 800-post cap means each user's timeline uses at most 32KB of Redis memory β€” allowing hundreds of millions of user timelines to fit within a reasonably-sized Redis cluster.

Apache Kafka is the decoupling mechanism between post creation and fan-out. The posts.created topic is partitioned by author_id, ensuring that all posts from one author are processed by the same fan-out worker partition and arrive in creation order. Kafka's configurable retention (7–30 days) allows fan-out workers to replay events after recovery from a worker failure β€” meaning no posts are permanently lost from timelines due to worker crashes.

Graph Services (Twitter's FlockDB, Meta's TAO, LinkedIn's Leo) store the social graph as distributed adjacency lists. In a simplified architecture, the Graph Service is a Redis cluster where follow:{user_id}:followers is a Sorted Set of follower user IDs scored by follow recency. This enables efficient retrieval of the most recently active followers (highest scores) for optimized fan-out prioritization during celebrity post events.

πŸ“š Lessons Learned from Production News Feed Systems

The Celebrity Threshold Requires Continuous Calibration. Setting the celebrity threshold too low pushes too many accounts into pull fan-out, degrading read performance for mid-tier influencers whose follower lists still take seconds to query. Setting it too high causes write amplification storms during viral moments. Twitter's team tuned their threshold multiple times as the platform grew β€” and built tooling to temporarily adjust the threshold during scheduled events like the Super Bowl or election nights when multiple high-follower accounts post simultaneously.

Timeline Hydration Is the Real Latency Bottleneck. The Redis sorted set lookup is fast (1–2ms). The bottleneck is the subsequent batch fetch of post content from the Post Store. Production systems mitigate this by caching individual post records in a separate Redis hash (post:{post_id}), so that 95%+ of timeline hydration is served entirely from Redis without touching Postgres.

Eventual Consistency Windows Must Be Documented as Product Decisions. A 5-second delay in a new post appearing in follower feeds is an engineering constraint, not a bug. But if the product team is not aligned on this, they will treat it as a critical defect whenever they notice it. Document the fan-out latency window explicitly as a product design choice and establish a per-tier SLA: normal users see posts within 5 seconds, celebrity posts may take up to 60 seconds to appear in all follower feeds.

Content Deletion from Timelines Is Harder Than Creation. When a post is deleted (moderation, user request, DMCA), it must be removed from potentially millions of pre-computed timelines in Redis sorted sets. This is the reverse fan-out problem. Production systems handle deletion differently from creation: rather than removing the post ID from every timeline immediately, a deletion flag is set in the Post Store, and the Timeline Service filters out deleted post IDs during hydration. This avoids the reverse fan-out write amplification at the cost of storing soft-deleted post IDs in timelines temporarily.

πŸ“Œ Key Takeaways: News Feed System Design

  • A news feed is a read-heavy, write-amplified system. The core design tension is between write-time cost (fan-out to millions of followers) and read-time cost (merging posts from all followed authors on every page load).
  • Hybrid fan-out is the production-standard solution: push to normal users' timelines at write time, pull celebrity posts at read time, and merge both at the Timeline Service layer.
  • Redis Sorted Sets with timestamp scores are the standard data structure for pre-computed timelines. A 800-post cap per user bounds memory usage regardless of how many accounts a user follows.
  • Kafka decouples post creation from fan-out processing, ensuring that viral posts and write amplification never block the author's POST /post response path.
  • The Graph Service is a critical dependency for fan-out workers. Cache follower lists locally with a short TTL to protect against Graph Service outages.
  • Post deletion is the reverse fan-out problem. Use soft-deletion flags in the Post Store and filter at hydration time rather than attempting to remove post IDs from millions of timeline sorted sets simultaneously.
Share

Test Your Knowledge

🧠

Ready to test what you just learned?

AI will generate 4 questions based on this article's content.

Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms