System Design HLD Example: News Feed (Home Timeline)
Interview-focused HLD for a scalable social feed with fan-out and ranking trade-offs.
Abstract AlgorithmsAI-assisted content. This post may have been written or enhanced with AI tools. Please verify critical information independently.
TLDR: A news feed system builds personalized timelines by combining content publishing, graph relationships, and ranking. The scalability crux is the fan-out amplified write path: a single celebrity post can trigger 100M writes. A hybrid fan-out strategyβpush for normal users, pull for celebritiesβis the industry standard for 99.99% availability.
π The Katy Perry Problem
Imagine itβs 2013. Twitter is growing at a breakneck pace. Most users have a few hundred followers, and the system handles them easily by "pushing" new tweets into their home timelines at write-time. Then, Katy Perry posts.
She has 100 million followers. If the system sticks to its standard "push" model, a single tweet triggers 100 million database writes simultaneously. The message queues backup, the database primary hits 100% CPU, and for the next three hours, nobody on Twitter can see new posts.
This is the Write Amplification trap. In a news feed, the challenge isn't just storing dataβit's the massive disparity between a single post and its global consumption. If you design for the average user, you fail at the edges. If you design for the edge cases, you might over-engineer the core.
π News Feed: Use Cases, Actors, and Scale Requirements
Actors
- Publisher / Author: Creates and posts content (text, media).
- Reader / Follower: Consumes a ranked home feed of followed users' posts.
- System: Handles fan-out, ranking, and timeline materialization.
Functional Requirements
- Post Creation: Users can POST content with text/media.
- Timeline Read: Users see a chronological or ranked feed of people they follow.
- Follow Graph: Users can follow/unfollow others, immediately affecting their feed.
- Feed Freshness: New posts should appear in follower feeds within 5 seconds.
Non-Functional Requirements
- High Read Availability: 99.99% (Users check feeds constantly).
- Low Latency: Timeline reads should be < 100ms.
- Scalability: Handle 100k post creates/sec and 1M reads/sec.
- Eventual Consistency: A 5-second delay in post visibility is acceptable to ensure write availability.
π Basics of News Feed Architecture
At its heart, a news feed is a many-to-many relationship pipeline. Unlike a simple blog where one post is read by everyone, a feed is a personalized "inbox" for every user.
The baseline architecture involves three main steps:
- Ingestion: Taking the author's post and making it durable.
- Fan-out: Distributing that post to every follower's list.
- Hydration: Combining the list of post IDs with the actual content (text, images) to show to the user.
Without these basics, the system would require a massive SQL JOIN between the Posts and Follows tables for every single user refreshβa process that would collapse under the load of even a small social network.
βοΈ Core Mechanics: Push vs. Pull
The mechanism of distribution is called Fan-out.
- Fan-out on Write (Push): When a post is created, we immediately write it into the pre-computed timelines of all followers. Pros: Reads are incredibly fast. Cons: Writes are expensive if the author has millions of followers.
- Fan-out on Read (Pull): We don't do anything at write-time. When a user requests their feed, we pull the most recent posts from everyone they follow and sort them on the fly. Pros: No write amplification. Cons: Reads are very slow.
Modern systems use a Hybrid Mechanic to get the best of both worlds.
π Estimations & Design Goals
The Math of Fan-out
- Daily Active Users (DAU): 500 Million.
- Average Follower Count: 200.
- Post Volume: 500M posts/day (~5,800 writes/sec).
- Read Volume: 50B views/day (~580k reads/sec).
- Write Amplification: 5,800 posts/sec 200 followers = *1.16M timeline writes/sec.
Key Goal: Decouple the "Write to Post Store" from the "Fan-out to Timelines." The user should get a "Success" response as soon as the post is durable, even if the fan-out takes a few more seconds.
π High-Level Design: The Hybrid Fan-out Architecture
The following diagram illustrates the separation between the synchronous write path and the asynchronous materialization pipeline.
graph TD
User((User)) --> LB[Load Balancer]
LB --> AG[API Gateway]
subgraph Write_Path
AG --> PS[Post Service]
PS --> PDB[(Post DB: Postgres)]
PS --> MQ[Message Queue: Kafka]
end
subgraph FanOut_Pipeline
MQ --> FW[Fan-out Workers]
FW --> GS[Graph Service]
FW --> RC[(Timeline Cache: Redis)]
end
subgraph Read_Path
AG --> TS[Timeline Service]
TS --> RC
TS -.->|Fallback| PDB
end
The diagram maps the full lifecycle of a post from creation to timeline display. On the Write Path, the Post Service makes the post durable in Postgres and immediately emits a PostCreated event to Kafka β the user sees a "Success" response without waiting for fan-out to complete. On the Fan-out Pipeline, workers consume from Kafka, query the Graph Service for the author's follower list, and write post IDs into each follower's pre-computed timeline in Redis as a sorted set ordered by timestamp. On the Read Path, the Timeline Service retrieves pre-computed post IDs from Redis and hydrates them with content from the Post Store β delivering sub-100ms feed loads regardless of how many follows a user has.
π§ Deep Dive: The Hybrid Fan-out Strategy and Its Data Structures
Internals: How the Fan-out Worker Uses the Social Graph to Route Timeline Writes
When a PostCreated event arrives on the posts.created Kafka topic, the Fan-out Worker executes a precise sequence. First, it reads the author_id from the event payload and looks up the celebrity flag in Redis: GET celebrity:{author_id}. If the flag is set, the worker writes nothing to timelines β the post will be served via pull fan-out at read time. If the flag is absent, the worker fetches the author's follower list from the Graph Service.
The follower list retrieval is the I/O bottleneck in the fan-out pipeline. For an author with 500,000 followers, the Graph Service returns the list as a paginated stream of follower user IDs, sorted by last-active timestamp descending. The Fan-out Worker processes followers in priority order: it writes to the timelines of the 50,000 most recently active followers first (immediate priority), then processes the remaining followers asynchronously. This active-follower-first approach ensures that the users most likely to open the app immediately after a post is published see it in their feed within seconds, even if the tail of inactive followers receives the update minutes later.
For each follower, the worker executes ZADD timeline:{follower_user_id} {post_timestamp_ms} {post_id} followed by ZREMRANGEBYRANK timeline:{follower_user_id} 0 -801 to trim the sorted set to the 800 most recent entries. Both Redis commands are pipelined in batches of 1,000 followers, reducing the number of Redis round-trips from O(followers) to O(followers/1000).
Performance Analysis: Write Amplification Math and Redis Memory Budget
The fan-out pipeline's performance characteristics are dominated by the write amplification factor:
| Metric | Value | Calculation |
| Post creation rate | 5,800 posts/sec | 500M DAU Γ 0.001 posts/day Γ· 86,400 |
| Average follower count | 200 | Platform average (not celebrity-skewed) |
| Fan-out ZADD operations/sec | 1.16M/sec | 5,800 Γ 200 |
| Celebrity posts (skip fan-out) | ~0.1% of posts | Authors above threshold |
| Peak fan-out (viral event, 1M follower author) | 1M writes in < 60 sec | Requires 200+ parallel worker shards |
| Redis memory per user timeline | ~32 KB | 800 entries Γ 40 bytes (score + post_id) |
| Total Redis memory for 500M users | ~16 TB | Requires Redis Cluster with tiered eviction |
The 16 TB figure reveals why production systems do not store every user's timeline in Redis. Instead, they store only active users β users who have been active within the last 30 days. For a platform with 500M total users but 150M monthly active users, the Redis timeline storage requirement drops to approximately 4.8 TB, manageable with a 10-node Redis Cluster at 512 GB RAM per node. Inactive users' timelines are rebuilt from Postgres on their next app open.
The core architectural challenge in a news feed is write amplification. When Katy Perry (100 million followers) posts a tweet, a naive push fan-out generates 100 million writes to follower timelines simultaneously. This is the Celebrity Problem β and it is why every production feed system uses a hybrid fan-out strategy rather than a pure push or pure pull model.
The Three Fan-out Models Compared
| Model | Write-Time Work | Read-Time Work | Best For |
| Fan-out on Write (Push) | Write post ID into every follower's timeline Redis sorted set immediately | Read pre-computed timeline β O(1) Redis lookup | Authors with small to medium follower counts (< 5,000 followers) |
| Fan-out on Read (Pull) | Write only to Post Store β no fan-out | Fetch recent posts from all followed authors and merge β O(follows) DB queries | Celebrity accounts with millions of followers |
| Hybrid (Production Standard) | Push to timelines of normal users only; skip celebrities | Merge pre-computed timeline with on-demand celebrity posts at read time | All production social feeds β optimal for both tails |
The threshold for "celebrity" varies by platform: Twitter used approximately 1 million followers, Instagram uses a similar threshold. Accounts below the threshold get full push fan-out; accounts above get pull fan-out at read time.
Timeline Data Structure in Redis
Each user's pre-computed timeline is stored as a Redis Sorted Set (ZADD) keyed by user ID. The score is the post's publish timestamp (Unix milliseconds), which gives natural chronological ordering and supports efficient range queries.
| Redis Key | Structure | Score | Member | TTL |
timeline:{user_id} | Sorted Set | Post timestamp (Unix ms) | {post_id} | 7 days |
post:{post_id} | Hash | β | {author_id, body, media_url, like_count} | 30 days |
follow:{user_id}:count | Integer | β | Follower count | No TTL |
celebrity:{user_id} | Boolean flag | β | 1 if above threshold | No TTL |
The timeline sorted set is capped at the most recent 800 post IDs per user. When a fan-out worker writes post ID 801, it trims the oldest entry (ZREMRANGEBYRANK timeline:{user_id} 0 0). This bounds memory usage regardless of how long a user stays away and how many people they follow.
Post Store Data Model
| Column | Type | Constraint | Purpose |
| post_id | UUID | PRIMARY KEY | Unique post identifier |
| author_id | UUID | NOT NULL, FK β users | Author reference |
| body | TEXT | NOT NULL, max 2KB | Post text content |
| media_urls | TEXT[] | nullable | Array of image/video CDN URLs |
| post_type | ENUM | NOT NULL | text / image / video / share |
| created_at | TIMESTAMPTZ | DEFAULT NOW() | Post timestamp; used as timeline score |
| like_count | BIGINT | DEFAULT 0 | Denormalized counter; updated via async aggregation |
| comment_count | BIGINT | DEFAULT 0 | Denormalized counter |
| is_deleted | BOOLEAN | DEFAULT FALSE | Soft-delete for content moderation |
| visibility | ENUM | DEFAULT public | public / followers_only / private |
The like_count and comment_count columns are intentionally denormalized. Computing them via COUNT(*) joins on every timeline hydration would be prohibitively expensive. They are updated by a dedicated aggregation service that batches like/comment events from Kafka and periodically flushes counts to Postgres.
Fan-out Worker Decision Flow
graph TD
A[Fan-out Worker reads PostCreated event from Kafka] --> B{Is author a celebrity?}
B -->|No β followers below threshold| C[Fetch all followers from Graph Service]
C --> D[Write post_id to each follower timeline in Redis ZADD]
D --> E[Trim timeline to 800 entries ZREMRANGEBYRANK]
B -->|Yes β followers above threshold| F[Skip push fan-out entirely]
F --> G[Post available for pull fan-out at read time]
E --> H[Fan-out complete β post visible in follower feeds]
G --> H
The decision flow shows exactly where the hybrid threshold determines whether a post is pushed or pulled. For non-celebrity authors, the worker writes the post ID directly into every follower's timeline sorted set and trims the oldest entry to maintain the 800-post cap. For celebrity authors, the worker writes nothing to follower timelines β the post is retrieved on-demand at read time from the Post Store and merged with the pre-computed timeline.
π Real-World Applications: How Twitter, Instagram, and LinkedIn Handle Feed Fan-out
Twitter pioneered the hybrid fan-out approach that the entire industry now follows. Twitter's 2013 architecture blog post introduced the concept of separating "normal user" push fan-out from "celebrity" pull fan-out using a follower-count threshold. Twitter's Graph Service (backed by FlockDB, a distributed adjacency list database) can return the 100K most active followers of an author in under 100ms β enabling targeted push fan-out to the most engaged followers even for large accounts. The less-active long tail of followers receives eventual fan-out as capacity allows.
Instagram handles 100 million posts per day with a fan-out system built on Apache Kafka and a custom Redis sharding layer. Instagram's key innovation is the ranked feed: rather than a purely chronological timeline, posts are ranked by a machine learning model that scores each post for relevance to the specific viewer. This ranking computation happens at read time for each user's feed request, combining the pre-computed timeline of post IDs with real-time engagement signals. Instagram found that ranked feeds increased per-user session length by over 40% compared to chronological feeds.
LinkedIn operates a feed system with unique constraints: professional content has a much longer relevance window than social content. A job posting or professional article is still relevant 7 days after publication β unlike a tweet that is stale within hours. LinkedIn's feed system extends the timeline TTL to 30 days (versus Twitter's 7 days) and weighs engagement recency signals more heavily to surface still-relevant older content alongside fresh posts.
βοΈ Trade-offs and Failure Modes in News Feed Architecture
Write Amplification at Celebrity Scale
The most cited failure mode in news feed systems is write amplification from celebrity posts. A single post by an author with 50 million followers triggers 50 million Redis ZADD operations. At 10ms per ZADD (including network I/O), 50 million operations would take 500,000 server-seconds β clearly impossible in real time. Production systems handle this by:
- Async fan-out with backpressure: Kafka allows fan-out workers to process at their own pace. Follower timelines for less-active users may lag by minutes during a celebrity spike β an acceptable trade-off given the eventual-consistency SLA.
- Parallel worker shards: Fan-out workers are sharded by follower ID range, so the 50 million followers are processed in parallel across hundreds of worker instances.
- Active follower prioritization: Only the most recently active followers (e.g., active in the last 24 hours) receive immediate push fan-out. Inactive followers' timelines are populated lazily when they next open the app.
Timeline Cache Miss on New User or Long-Absence Return
When a user returns to the app after a long absence, their Redis timeline sorted set may have expired (TTL elapsed) or been evicted under memory pressure. The Timeline Service must handle this gracefully: fall back to the Postgres Post Store and rebuild the timeline by joining recent posts from all followed authors. This is an expensive query β O(follows Γ posts per author) β that should be served from a read replica and cached aggressively after the first rebuild.
Graph Service as a Single Point of Failure
The Fan-out Worker depends on the Graph Service to look up follower lists. If the Graph Service is slow or unavailable, fan-out workers stall and timelines become stale. Mitigation: cache follower lists in a local cache with a 5-minute TTL on each Fan-out Worker instance. On Graph Service failure, workers use the stale cached list β accepting up to 5 minutes of follower-list staleness rather than stopping fan-out entirely.
π§ Decision Guide: Fan-out Strategy Selection for Your Scale
| Scenario | Recommended Strategy | Rationale |
| Small social app, < 1M users, max 10K followers per author | Pure Fan-out on Write (Push) | Simple implementation; write amplification is manageable |
| Mid-scale platform, max 500K followers, > 10M DAU | Hybrid (push for < 10K followers, pull for celebrities) | Celebrity problem becomes significant above this threshold |
| Large-scale platform with influencers, > 100M DAU | Hybrid with active-follower prioritization | Only push to recently active followers to reduce amplification |
| Ranked feed (ML-scored, not chronological) | Fan-out on Write for IDs + read-time ML ranking | Pre-computing rankings at write time is impractical; score at read time |
| Real-time feeds (< 1 second freshness required) | Fan-out on Write only, no celebrity exemption | Pull fan-out at read time adds latency that violates real-time SLA |
| Long content lifespan (articles, jobs) | Fan-out on Write + extended Redis TTL (30 days) | Content remains relevant longer; timeline expiry must match content lifecycle |
π§ͺ Interview Delivery Example: Walking Through a News Feed in 45 Minutes
Minute 1β5: Requirements scoping. Ask: "Is the feed chronological or ranked by relevance? What is the maximum acceptable staleness for new posts appearing in follower feeds? What is the expected follower count distribution β are there celebrity accounts?" These questions signal that you understand the architectural implications of fan-out strategy selection.
Minute 6β15: Write path. Establish the decoupling pattern: "When an author posts, the Post Service writes to Postgres and publishes a PostCreated event to Kafka immediately β the author receives their success response without waiting for fan-out. Fan-out is asynchronous and can lag by seconds without affecting the author's experience."
Minute 16β30: Fan-out strategy. Introduce the celebrity problem before the interviewer can ask: "If every author gets push fan-out, a single post by an account with 50 million followers generates 50 million Redis writes. This is the write amplification problem. The industry-standard solution is a hybrid model: push fan-out for authors with fewer than N followers, and pull fan-out at read time for celebrities. The Timeline Service merges both at query time."
Minute 31β40: Data model and read path. Present the Redis sorted set structure for timeline storage. Explain the 800-post cap and why it bounds memory. Walk through the hydration step: "The timeline contains only post IDs. The Timeline Service fetches the actual post content from the Post Store in a parallel batch fetch, then returns the merged list to the client."
Minute 41β45: Failure modes. Address three scenarios: (1) celebrity post causing write spike β answer with Kafka backpressure and active-follower prioritization; (2) timeline cache miss on user return β answer with Post Store fallback and rebuild; (3) Graph Service failure during fan-out β answer with local follower list cache and staleness tolerance.
π οΈ Redis, Kafka, and the Graph Store: How Production Feed Systems Are Built
Redis Cluster stores the pre-computed timelines as sorted sets. In production, user IDs are hashed across Redis nodes, distributing timeline storage evenly. A timeline sorted set consumes approximately 40 bytes per post ID entry (score + member). The 800-post cap means each user's timeline uses at most 32KB of Redis memory β allowing hundreds of millions of user timelines to fit within a reasonably-sized Redis cluster.
Apache Kafka is the decoupling mechanism between post creation and fan-out. The posts.created topic is partitioned by author_id, ensuring that all posts from one author are processed by the same fan-out worker partition and arrive in creation order. Kafka's configurable retention (7β30 days) allows fan-out workers to replay events after recovery from a worker failure β meaning no posts are permanently lost from timelines due to worker crashes.
Graph Services (Twitter's FlockDB, Meta's TAO, LinkedIn's Leo) store the social graph as distributed adjacency lists. In a simplified architecture, the Graph Service is a Redis cluster where follow:{user_id}:followers is a Sorted Set of follower user IDs scored by follow recency. This enables efficient retrieval of the most recently active followers (highest scores) for optimized fan-out prioritization during celebrity post events.
π Lessons Learned from Production News Feed Systems
The Celebrity Threshold Requires Continuous Calibration. Setting the celebrity threshold too low pushes too many accounts into pull fan-out, degrading read performance for mid-tier influencers whose follower lists still take seconds to query. Setting it too high causes write amplification storms during viral moments. Twitter's team tuned their threshold multiple times as the platform grew β and built tooling to temporarily adjust the threshold during scheduled events like the Super Bowl or election nights when multiple high-follower accounts post simultaneously.
Timeline Hydration Is the Real Latency Bottleneck. The Redis sorted set lookup is fast (1β2ms). The bottleneck is the subsequent batch fetch of post content from the Post Store. Production systems mitigate this by caching individual post records in a separate Redis hash (post:{post_id}), so that 95%+ of timeline hydration is served entirely from Redis without touching Postgres.
Eventual Consistency Windows Must Be Documented as Product Decisions. A 5-second delay in a new post appearing in follower feeds is an engineering constraint, not a bug. But if the product team is not aligned on this, they will treat it as a critical defect whenever they notice it. Document the fan-out latency window explicitly as a product design choice and establish a per-tier SLA: normal users see posts within 5 seconds, celebrity posts may take up to 60 seconds to appear in all follower feeds.
Content Deletion from Timelines Is Harder Than Creation. When a post is deleted (moderation, user request, DMCA), it must be removed from potentially millions of pre-computed timelines in Redis sorted sets. This is the reverse fan-out problem. Production systems handle deletion differently from creation: rather than removing the post ID from every timeline immediately, a deletion flag is set in the Post Store, and the Timeline Service filters out deleted post IDs during hydration. This avoids the reverse fan-out write amplification at the cost of storing soft-deleted post IDs in timelines temporarily.
π Key Takeaways: News Feed System Design
- A news feed is a read-heavy, write-amplified system. The core design tension is between write-time cost (fan-out to millions of followers) and read-time cost (merging posts from all followed authors on every page load).
- Hybrid fan-out is the production-standard solution: push to normal users' timelines at write time, pull celebrity posts at read time, and merge both at the Timeline Service layer.
- Redis Sorted Sets with timestamp scores are the standard data structure for pre-computed timelines. A 800-post cap per user bounds memory usage regardless of how many accounts a user follows.
- Kafka decouples post creation from fan-out processing, ensuring that viral posts and write amplification never block the author's
POST /postresponse path. - The Graph Service is a critical dependency for fan-out workers. Cache follower lists locally with a short TTL to protect against Graph Service outages.
- Post deletion is the reverse fan-out problem. Use soft-deletion flags in the Post Store and filter at hydration time rather than attempting to remove post IDs from millions of timeline sorted sets simultaneously.
π Related Posts
Test Your Knowledge
Ready to test what you just learned?
AI will generate 4 questions based on this article's content.

Written by
Abstract Algorithms
@abstractalgorithms
More Posts
RAG vs Fine-Tuning: When to Use Each (and When to Combine Them)
TLDR: RAG gives LLMs access to current knowledge at inference time; fine-tuning changes how they reason and write. Use RAG when your data changes. Use fine-tuning when you need consistent style, tone, or domain reasoning. Use both for production assi...
Fine-Tuning LLMs with LoRA and QLoRA: A Practical Deep-Dive
TLDR: LoRA freezes the base model and trains two tiny matrices per layer β 0.1 % of parameters, 70 % less GPU memory, near-identical quality. QLoRA adds 4-bit NF4 quantization of the frozen base, enabling 70B fine-tuning on 2Γ A100 80 GB instead of 8...
Build vs Buy: Deploying Your Own LLM vs Using ChatGPT, Gemini, and Claude APIs
TLDR: Use the API until you hit $10K/month or a hard data privacy requirement. Then add a semantic cache. Then evaluate hybrid routing. Self-hosting full model serving is only cost-effective at > 50M tokens/day with a dedicated MLOps team. The build ...
Watermarking and Late Data Handling in Spark Structured Streaming
TLDR: A watermark tells Spark Structured Streaming: "I will accept events up to N minutes late, and then I am done waiting." Spark tracks the maximum event time seen per partition, takes the global minimum across all partitions, subtracts the thresho...
