System Design HLD Example: News Feed (Home Timeline)

Interview-focused HLD for a scalable social feed with fan-out and ranking trade-offs.

Abstract Algorithms

·Mar 13, 2026·18 min read

AI-assisted content. This post may have been written or enhanced with AI tools. Please verify critical information independently.

TLDR: A news feed system builds personalized timelines by combining content publishing, graph relationships, and ranking. The scalability crux is the fan-out amplified write path: a single celebrity post can trigger 100M writes. A hybrid fan-out strategy—push for normal users, pull for celebrities—is the industry standard for 99.99% availability.

📖 The Katy Perry Problem

Imagine it’s 2013. Twitter is growing at a breakneck pace. Most users have a few hundred followers, and the system handles them easily by "pushing" new tweets into their home timelines at write-time. Then, Katy Perry posts.

She has 100 million followers. If the system sticks to its standard "push" model, a single tweet triggers 100 million database writes simultaneously. The message queues backup, the database primary hits 100% CPU, and for the next three hours, nobody on Twitter can see new posts.

This is the Write Amplification trap. In a news feed, the challenge isn't just storing data—it's the massive disparity between a single post and its global consumption. If you design for the average user, you fail at the edges. If you design for the edge cases, you might over-engineer the core.

📖 News Feed: Use Cases, Actors, and Scale Requirements

Actors

Publisher / Author: Creates and posts content (text, media).
Reader / Follower: Consumes a ranked home feed of followed users' posts.
System: Handles fan-out, ranking, and timeline materialization.

Functional Requirements

Post Creation: Users can POST content with text/media.
Timeline Read: Users see a chronological or ranked feed of people they follow.
Follow Graph: Users can follow/unfollow others, immediately affecting their feed.
Feed Freshness: New posts should appear in follower feeds within 5 seconds.

Non-Functional Requirements

High Read Availability: 99.99% (Users check feeds constantly).
Low Latency: Timeline reads should be < 100ms.
Scalability: Handle 100k post creates/sec and 1M reads/sec.
Eventual Consistency: A 5-second delay in post visibility is acceptable to ensure write availability.

🔍 Basics of News Feed Architecture

At its heart, a news feed is a many-to-many relationship pipeline. Unlike a simple blog where one post is read by everyone, a feed is a personalized "inbox" for every user.

The baseline architecture involves three main steps:

Ingestion: Taking the author's post and making it durable.
Fan-out: Distributing that post to every follower's list.
Hydration: Combining the list of post IDs with the actual content (text, images) to show to the user.

Without these basics, the system would require a massive SQL JOIN between the Posts and Follows tables for every single user refresh—a process that would collapse under the load of even a small social network.

⚙️ Core Mechanics: Push vs. Pull

The mechanism of distribution is called Fan-out.

Fan-out on Write (Push): When a post is created, we immediately write it into the pre-computed timelines of all followers. Pros: Reads are incredibly fast. Cons: Writes are expensive if the author has millions of followers.
Fan-out on Read (Pull): We don't do anything at write-time. When a user requests their feed, we pull the most recent posts from everyone they follow and sort them on the fly. Pros: No write amplification. Cons: Reads are very slow.

Modern systems use a Hybrid Mechanic to get the best of both worlds.

📐 Estimations & Design Goals

The Math of Fan-out

Daily Active Users (DAU): 500 Million.
Average Follower Count: 200.
Post Volume: 500M posts/day (~5,800 writes/sec).
Read Volume: 50B views/day (~580k reads/sec).
Write Amplification: 5,800 posts/sec 200 followers = *1.16M timeline writes/sec.

Key Goal: Decouple the "Write to Post Store" from the "Fan-out to Timelines." The user should get a "Success" response as soon as the post is durable, even if the fan-out takes a few more seconds.

📊 High-Level Design: The Hybrid Fan-out Architecture

The following diagram illustrates the separation between the synchronous write path and the asynchronous materialization pipeline.

graph TD
    User((User)) --> LB[Load Balancer]
    LB --> AG[API Gateway]

    subgraph Write_Path
        AG --> PS[Post Service]
        PS --> PDB[(Post DB: Postgres)]
        PS --> MQ[Message Queue: Kafka]
    end

    subgraph FanOut_Pipeline
        MQ --> FW[Fan-out Workers]
        FW --> GS[Graph Service]
        FW --> RC[(Timeline Cache: Redis)]
    end

    subgraph Read_Path
        AG --> TS[Timeline Service]
        TS --> RC
        TS -.->|Fallback| PDB
    end

The diagram maps the full lifecycle of a post from creation to timeline display. On the Write Path, the Post Service makes the post durable in Postgres and immediately emits a PostCreated event to Kafka — the user sees a "Success" response without waiting for fan-out to complete. On the Fan-out Pipeline, workers consume from Kafka, query the Graph Service for the author's follower list, and write post IDs into each follower's pre-computed timeline in Redis as a sorted set ordered by timestamp. On the Read Path, the Timeline Service retrieves pre-computed post IDs from Redis and hydrates them with content from the Post Store — delivering sub-100ms feed loads regardless of how many follows a user has.

🧠 Deep Dive: The Hybrid Fan-out Strategy and Its Data Structures

When a PostCreated event arrives on the posts.created Kafka topic, the Fan-out Worker executes a precise sequence. First, it reads the author_id from the event payload and looks up the celebrity flag in Redis: GET celebrity:{author_id}. If the flag is set, the worker writes nothing to timelines — the post will be served via pull fan-out at read time. If the flag is absent, the worker fetches the author's follower list from the Graph Service.

The follower list retrieval is the I/O bottleneck in the fan-out pipeline. For an author with 500,000 followers, the Graph Service returns the list as a paginated stream of follower user IDs, sorted by last-active timestamp descending. The Fan-out Worker processes followers in priority order: it writes to the timelines of the 50,000 most recently active followers first (immediate priority), then processes the remaining followers asynchronously. This active-follower-first approach ensures that the users most likely to open the app immediately after a post is published see it in their feed within seconds, even if the tail of inactive followers receives the update minutes later.

For each follower, the worker executes ZADD timeline:{follower_user_id} {post_timestamp_ms} {post_id} followed by ZREMRANGEBYRANK timeline:{follower_user_id} 0 -801 to trim the sorted set to the 800 most recent entries. Both Redis commands are pipelined in batches of 1,000 followers, reducing the number of Redis round-trips from O(followers) to O(followers/1000).

Performance Analysis: Write Amplification Math and Redis Memory Budget

The fan-out pipeline's performance characteristics are dominated by the write amplification factor:

Metric	Value	Calculation
Post creation rate	5,800 posts/sec	500M DAU × 0.001 posts/day ÷ 86,400
Average follower count	200	Platform average (not celebrity-skewed)
Fan-out ZADD operations/sec	1.16M/sec	5,800 × 200
Celebrity posts (skip fan-out)	~0.1% of posts	Authors above threshold
Peak fan-out (viral event, 1M follower author)	1M writes in < 60 sec	Requires 200+ parallel worker shards
Redis memory per user timeline	~32 KB	800 entries × 40 bytes (score + post_id)
Total Redis memory for 500M users	~16 TB	Requires Redis Cluster with tiered eviction

The 16 TB figure reveals why production systems do not store every user's timeline in Redis. Instead, they store only active users — users who have been active within the last 30 days. For a platform with 500M total users but 150M monthly active users, the Redis timeline storage requirement drops to approximately 4.8 TB, manageable with a 10-node Redis Cluster at 512 GB RAM per node. Inactive users' timelines are rebuilt from Postgres on their next app open.

The core architectural challenge in a news feed is write amplification. When Katy Perry (100 million followers) posts a tweet, a naive push fan-out generates 100 million writes to follower timelines simultaneously. This is the Celebrity Problem — and it is why every production feed system uses a hybrid fan-out strategy rather than a pure push or pure pull model.

The Three Fan-out Models Compared

Model	Write-Time Work	Read-Time Work	Best For
Fan-out on Write (Push)	Write post ID into every follower's timeline Redis sorted set immediately	Read pre-computed timeline — O(1) Redis lookup	Authors with small to medium follower counts (< 5,000 followers)
Fan-out on Read (Pull)	Write only to Post Store — no fan-out	Fetch recent posts from all followed authors and merge — O(follows) DB queries	Celebrity accounts with millions of followers
Hybrid (Production Standard)	Push to timelines of normal users only; skip celebrities	Merge pre-computed timeline with on-demand celebrity posts at read time	All production social feeds — optimal for both tails

The threshold for "celebrity" varies by platform: Twitter used approximately 1 million followers, Instagram uses a similar threshold. Accounts below the threshold get full push fan-out; accounts above get pull fan-out at read time.

Timeline Data Structure in Redis

Each user's pre-computed timeline is stored as a Redis Sorted Set (ZADD) keyed by user ID. The score is the post's publish timestamp (Unix milliseconds), which gives natural chronological ordering and supports efficient range queries.

Redis Key	Structure	Score	Member	TTL
`timeline:{user_id}`	Sorted Set	Post timestamp (Unix ms)	`{post_id}`	7 days
`post:{post_id}`	Hash	—	{author_id, body, media_url, like_count}	30 days
`follow:{user_id}:count`	Integer	—	Follower count	No TTL
`celebrity:{user_id}`	Boolean flag	—	`1` if above threshold	No TTL

The timeline sorted set is capped at the most recent 800 post IDs per user. When a fan-out worker writes post ID 801, it trims the oldest entry (ZREMRANGEBYRANK timeline:{user_id} 0 0). This bounds memory usage regardless of how long a user stays away and how many people they follow.

Post Store Data Model

Column	Type	Constraint	Purpose
post_id	UUID	PRIMARY KEY	Unique post identifier
author_id	UUID	NOT NULL, FK → users	Author reference
body	TEXT	NOT NULL, max 2KB	Post text content
media_urls	TEXT[]	nullable	Array of image/video CDN URLs
post_type	ENUM	NOT NULL	text / image / video / share
created_at	TIMESTAMPTZ	DEFAULT NOW()	Post timestamp; used as timeline score
like_count	BIGINT	DEFAULT 0	Denormalized counter; updated via async aggregation
comment_count	BIGINT	DEFAULT 0	Denormalized counter
is_deleted	BOOLEAN	DEFAULT FALSE	Soft-delete for content moderation
visibility	ENUM	DEFAULT public	public / followers_only / private

The like_count and comment_count columns are intentionally denormalized. Computing them via COUNT(*) joins on every timeline hydration would be prohibitively expensive. They are updated by a dedicated aggregation service that batches like/comment events from Kafka and periodically flushes counts to Postgres.

Fan-out Worker Decision Flow

graph TD
    A[Fan-out Worker reads PostCreated event from Kafka] --> B{Is author a celebrity?}
    B -->|No — followers below threshold| C[Fetch all followers from Graph Service]
    C --> D[Write post_id to each follower timeline in Redis ZADD]
    D --> E[Trim timeline to 800 entries ZREMRANGEBYRANK]
    B -->|Yes — followers above threshold| F[Skip push fan-out entirely]
    F --> G[Post available for pull fan-out at read time]
    E --> H[Fan-out complete — post visible in follower feeds]
    G --> H

The decision flow shows exactly where the hybrid threshold determines whether a post is pushed or pulled. For non-celebrity authors, the worker writes the post ID directly into every follower's timeline sorted set and trims the oldest entry to maintain the 800-post cap. For celebrity authors, the worker writes nothing to follower timelines — the post is retrieved on-demand at read time from the Post Store and merged with the pre-computed timeline.

🌍 Real-World Applications: How Twitter, Instagram, and LinkedIn Handle Feed Fan-out

Twitter pioneered the hybrid fan-out approach that the entire industry now follows. Twitter's 2013 architecture blog post introduced the concept of separating "normal user" push fan-out from "celebrity" pull fan-out using a follower-count threshold. Twitter's Graph Service (backed by FlockDB, a distributed adjacency list database) can return the 100K most active followers of an author in under 100ms — enabling targeted push fan-out to the most engaged followers even for large accounts. The less-active long tail of followers receives eventual fan-out as capacity allows.

Instagram handles 100 million posts per day with a fan-out system built on Apache Kafka and a custom Redis sharding layer. Instagram's key innovation is the ranked feed: rather than a purely chronological timeline, posts are ranked by a machine learning model that scores each post for relevance to the specific viewer. This ranking computation happens at read time for each user's feed request, combining the pre-computed timeline of post IDs with real-time engagement signals. Instagram found that ranked feeds increased per-user session length by over 40% compared to chronological feeds.

LinkedIn operates a feed system with unique constraints: professional content has a much longer relevance window than social content. A job posting or professional article is still relevant 7 days after publication — unlike a tweet that is stale within hours. LinkedIn's feed system extends the timeline TTL to 30 days (versus Twitter's 7 days) and weighs engagement recency signals more heavily to surface still-relevant older content alongside fresh posts.

⚖️ Trade-offs and Failure Modes in News Feed Architecture

Write Amplification at Celebrity Scale

The most cited failure mode in news feed systems is write amplification from celebrity posts. A single post by an author with 50 million followers triggers 50 million Redis ZADD operations. At 10ms per ZADD (including network I/O), 50 million operations would take 500,000 server-seconds — clearly impossible in real time. Production systems handle this by:

Async fan-out with backpressure: Kafka allows fan-out workers to process at their own pace. Follower timelines for less-active users may lag by minutes during a celebrity spike — an acceptable trade-off given the eventual-consistency SLA.
Parallel worker shards: Fan-out workers are sharded by follower ID range, so the 50 million followers are processed in parallel across hundreds of worker instances.
Active follower prioritization: Only the most recently active followers (e.g., active in the last 24 hours) receive immediate push fan-out. Inactive followers' timelines are populated lazily when they next open the app.

Timeline Cache Miss on New User or Long-Absence Return

When a user returns to the app after a long absence, their Redis timeline sorted set may have expired (TTL elapsed) or been evicted under memory pressure. The Timeline Service must handle this gracefully: fall back to the Postgres Post Store and rebuild the timeline by joining recent posts from all followed authors. This is an expensive query — O(follows × posts per author) — that should be served from a read replica and cached aggressively after the first rebuild.

Graph Service as a Single Point of Failure

The Fan-out Worker depends on the Graph Service to look up follower lists. If the Graph Service is slow or unavailable, fan-out workers stall and timelines become stale. Mitigation: cache follower lists in a local cache with a 5-minute TTL on each Fan-out Worker instance. On Graph Service failure, workers use the stale cached list — accepting up to 5 minutes of follower-list staleness rather than stopping fan-out entirely.

🧭 Decision Guide: Fan-out Strategy Selection for Your Scale

Scenario	Recommended Strategy	Rationale
Small social app, < 1M users, max 10K followers per author	Pure Fan-out on Write (Push)	Simple implementation; write amplification is manageable
Mid-scale platform, max 500K followers, > 10M DAU	Hybrid (push for < 10K followers, pull for celebrities)	Celebrity problem becomes significant above this threshold
Large-scale platform with influencers, > 100M DAU	Hybrid with active-follower prioritization	Only push to recently active followers to reduce amplification
Ranked feed (ML-scored, not chronological)	Fan-out on Write for IDs + read-time ML ranking	Pre-computing rankings at write time is impractical; score at read time
Real-time feeds (< 1 second freshness required)	Fan-out on Write only, no celebrity exemption	Pull fan-out at read time adds latency that violates real-time SLA
Long content lifespan (articles, jobs)	Fan-out on Write + extended Redis TTL (30 days)	Content remains relevant longer; timeline expiry must match content lifecycle

🧪 Interview Delivery Example: Walking Through a News Feed in 45 Minutes

Minute 1–5: Requirements scoping. Ask: "Is the feed chronological or ranked by relevance? What is the maximum acceptable staleness for new posts appearing in follower feeds? What is the expected follower count distribution — are there celebrity accounts?" These questions signal that you understand the architectural implications of fan-out strategy selection.

Minute 6–15: Write path. Establish the decoupling pattern: "When an author posts, the Post Service writes to Postgres and publishes a PostCreated event to Kafka immediately — the author receives their success response without waiting for fan-out. Fan-out is asynchronous and can lag by seconds without affecting the author's experience."

Minute 16–30: Fan-out strategy. Introduce the celebrity problem before the interviewer can ask: "If every author gets push fan-out, a single post by an account with 50 million followers generates 50 million Redis writes. This is the write amplification problem. The industry-standard solution is a hybrid model: push fan-out for authors with fewer than N followers, and pull fan-out at read time for celebrities. The Timeline Service merges both at query time."

Minute 31–40: Data model and read path. Present the Redis sorted set structure for timeline storage. Explain the 800-post cap and why it bounds memory. Walk through the hydration step: "The timeline contains only post IDs. The Timeline Service fetches the actual post content from the Post Store in a parallel batch fetch, then returns the merged list to the client."

Minute 41–45: Failure modes. Address three scenarios: (1) celebrity post causing write spike — answer with Kafka backpressure and active-follower prioritization; (2) timeline cache miss on user return — answer with Post Store fallback and rebuild; (3) Graph Service failure during fan-out — answer with local follower list cache and staleness tolerance.

🛠️ Redis, Kafka, and the Graph Store: How Production Feed Systems Are Built

Redis Cluster stores the pre-computed timelines as sorted sets. In production, user IDs are hashed across Redis nodes, distributing timeline storage evenly. A timeline sorted set consumes approximately 40 bytes per post ID entry (score + member). The 800-post cap means each user's timeline uses at most 32KB of Redis memory — allowing hundreds of millions of user timelines to fit within a reasonably-sized Redis cluster.

Apache Kafka is the decoupling mechanism between post creation and fan-out. The posts.created topic is partitioned by author_id, ensuring that all posts from one author are processed by the same fan-out worker partition and arrive in creation order. Kafka's configurable retention (7–30 days) allows fan-out workers to replay events after recovery from a worker failure — meaning no posts are permanently lost from timelines due to worker crashes.

Graph Services (Twitter's FlockDB, Meta's TAO, LinkedIn's Leo) store the social graph as distributed adjacency lists. In a simplified architecture, the Graph Service is a Redis cluster where follow:{user_id}:followers is a Sorted Set of follower user IDs scored by follow recency. This enables efficient retrieval of the most recently active followers (highest scores) for optimized fan-out prioritization during celebrity post events.

📚 Lessons Learned from Production News Feed Systems

The Celebrity Threshold Requires Continuous Calibration. Setting the celebrity threshold too low pushes too many accounts into pull fan-out, degrading read performance for mid-tier influencers whose follower lists still take seconds to query. Setting it too high causes write amplification storms during viral moments. Twitter's team tuned their threshold multiple times as the platform grew — and built tooling to temporarily adjust the threshold during scheduled events like the Super Bowl or election nights when multiple high-follower accounts post simultaneously.

Timeline Hydration Is the Real Latency Bottleneck. The Redis sorted set lookup is fast (1–2ms). The bottleneck is the subsequent batch fetch of post content from the Post Store. Production systems mitigate this by caching individual post records in a separate Redis hash (post:{post_id}), so that 95%+ of timeline hydration is served entirely from Redis without touching Postgres.

Eventual Consistency Windows Must Be Documented as Product Decisions. A 5-second delay in a new post appearing in follower feeds is an engineering constraint, not a bug. But if the product team is not aligned on this, they will treat it as a critical defect whenever they notice it. Document the fan-out latency window explicitly as a product design choice and establish a per-tier SLA: normal users see posts within 5 seconds, celebrity posts may take up to 60 seconds to appear in all follower feeds.

Content Deletion from Timelines Is Harder Than Creation. When a post is deleted (moderation, user request, DMCA), it must be removed from potentially millions of pre-computed timelines in Redis sorted sets. This is the reverse fan-out problem. Production systems handle deletion differently from creation: rather than removing the post ID from every timeline immediately, a deletion flag is set in the Post Store, and the Timeline Service filters out deleted post IDs during hydration. This avoids the reverse fan-out write amplification at the cost of storing soft-deleted post IDs in timelines temporarily.

📌 Key Takeaways: News Feed System Design

A news feed is a read-heavy, write-amplified system. The core design tension is between write-time cost (fan-out to millions of followers) and read-time cost (merging posts from all followed authors on every page load).
Hybrid fan-out is the production-standard solution: push to normal users' timelines at write time, pull celebrity posts at read time, and merge both at the Timeline Service layer.
Redis Sorted Sets with timestamp scores are the standard data structure for pre-computed timelines. A 800-post cap per user bounds memory usage regardless of how many accounts a user follows.
Kafka decouples post creation from fan-out processing, ensuring that viral posts and write amplification never block the author's POST /post response path.
The Graph Service is a critical dependency for fan-out workers. Cache follower lists locally with a short TTL to protect against Graph Service outages.
Post deletion is the reverse fan-out problem. Use soft-deletion flags in the Post Store and filter at hydration time rather than attempting to remove post IDs from millions of timeline sorted sets simultaneously.

Test Your Knowledge

🧠

Ready to test what you just learned?

AI will generate 4 questions based on this article's content.

RAG vs Fine-Tuning: When to Use Each (and When to Combine Them)

TLDR: RAG gives LLMs access to current knowledge at inference time; fine-tuning changes how they reason and write. Use RAG when your data changes. Use fine-tuning when you need consistent style, tone, or domain reasoning. Use both for production assi...

Apr 19, 2026•27 min read

Fine-Tuning LLMs with LoRA and QLoRA: A Practical Deep-Dive

TLDR: LoRA freezes the base model and trains two tiny matrices per layer — 0.1 % of parameters, 70 % less GPU memory, near-identical quality. QLoRA adds 4-bit NF4 quantization of the frozen base, enabling 70B fine-tuning on 2× A100 80 GB instead of 8...

Apr 19, 2026•29 min read

Build vs Buy: Deploying Your Own LLM vs Using ChatGPT, Gemini, and Claude APIs

TLDR: Use the API until you hit $10K/month or a hard data privacy requirement. Then add a semantic cache. Then evaluate hybrid routing. Self-hosting full model serving is only cost-effective at > 50M tokens/day with a dedicated MLOps team. The build ...

Apr 19, 2026•30 min read

Watermarking and Late Data Handling in Spark Structured Streaming

TLDR: A watermark tells Spark Structured Streaming: "I will accept events up to N minutes late, and then I am done waiting." Spark tracks the maximum event time seen per partition, takes the global minimum across all partitions, subtracts the thresho...

Apr 19, 2026•23 min read