System Design HLD Example: News Feed (Home Timeline)
Interview-focused HLD for a scalable social feed with fan-out and ranking trade-offs.
Abstract AlgorithmsTLDR: Design a news feed for a social platform. This article now follows your system design interview template flow: use cases, requirements, estimations, design goals, HLD, and design deep dive.
TLDR: A news feed system builds personalized timelines by combining content publishing, graph relationships, and ranking.
Twitter struggled with Barack Obama's 2009 inauguration โ 456 tweets per second overwhelmed a system that computed every follower's timeline on read. Engineers shifted to a write-time fan-out model that pre-pushed each tweet into follower inboxes at write time. That solved the read latency problem until Katy Perry accumulated 100 million followers, making synchronous write fan-out prohibitively expensive โ and forcing a hybrid model where posts from high-follower accounts are injected at read time.
Designing a news feed teaches you one of system design's most instructive trade-offs: fan-out on write (fast reads, expensive writes) versus fan-out on read (cheap writes, expensive reads), and when a hybrid of both is the only practical answer.
By the end of this walkthrough you'll know why the fan-out threshold sits at roughly 1,000 followers (above that, write-time pre-computation dominates on p95 read latency), why timeline stores use sorted sets scored by recency, and why ranking must execute outside the synchronous read path to avoid coupling feed freshness latency to ML inference time.
๐ Use Cases
Actors
- End users consuming the primary product surface.
- Producer entities that create or update domain content.
- Platform services enforcing policy, routing, and reliability controls.
Use Cases
- Primary interview prompt: Design a news feed for a social platform.
- Core user journeys: Create post, follow graph updates, read timeline, and keep feed freshness under high write fan-out.
- Read and write paths are explained separately so bottlenecks and consistency boundaries are explicit.
This template starts with actors and use cases because architecture only makes sense when user behavior and workload shape are clear. In interviews, this section prevents random tool selection and keeps the answer grounded in business outcomes.
๐ Functional Requirements
In Scope
- Support the core product flow end-to-end with clear API contracts.
- Preserve business correctness for critical operations.
- Expose reliable read and write interfaces with predictable behavior.
- Support an incremental scaling path instead of requiring a redesign.
Out of Scope (v1 boundary)
- Full global active-active writes across every region.
- Heavy analytical workloads mixed into latency-critical request paths.
- Complex personalization experiments in the first architecture version.
Functional Breakdown
- Prompt: Design a news feed for a social platform.
- Focus: Create post, follow graph updates, read timeline, and keep feed freshness under high write fan-out.
- Initial building-block perspective: Post service, graph service, fan-out workers, timeline store, cache layer, and ranking service.
A strong answer names non-goals explicitly. Interviewers use this to judge prioritization quality and architectural maturity under time constraints.
โ๏ธ Non Functional Requirements
| Dimension | Target | Why it matters |
| Scalability | Horizontal scale across services and workers | Handles growth without rewriting core flows |
| Availability | 99.9% baseline with path to 99.99% | Reduces user-visible downtime |
| Performance | Clear p95 and p99 latency SLOs | Avoids average-latency blind spots |
| Consistency | Explicit strong vs eventual boundaries | Prevents hidden correctness defects |
| Operability | Metrics, logs, traces, and runbooks | Speeds incident isolation and recovery |
Non-functional requirements are where many designs fail in practice. Naming measurable targets and coupling architecture decisions to those targets is far more useful than listing technologies.
๐ง Deep Dive: Estimations and Design Goals
The Internals
- Service boundaries should align with ownership and deployment isolation.
- Data model choices should follow access patterns, not default preferences.
- Retries, idempotency, and timeout budgets must be explicit before scale.
- Dependency failure behavior should be defined before incidents happen.
Estimations
Use structured rough-order numbers in interviews:
- Read and write throughput (steady and peak).
- Read/write ratio and burst amplification factor.
- Typical payload size and large-object edge cases.
- Daily storage growth and retention horizon.
- Cache memory for hot keys and frequently accessed entities.
| Estimation axis | Question to answer early |
| Read QPS | Which read path saturates first at 10x? |
| Write QPS | Which state mutation becomes the first bottleneck? |
| Storage growth | When does repartitioning become mandatory? |
| Memory envelope | What hot set must remain in memory? |
| Network profile | Which hops create the highest latency variance? |
Design Goals
- Keep synchronous user-facing paths short and deterministic.
- Shift heavy side effects and fan-out work to asynchronous channels.
- Minimize coupling between control-plane and data-plane components.
- Introduce complexity in phases tied to measurable bottlenecks.
Performance Analysis
| Pressure point | Symptom | First response | Second response |
| Hot partitions | Tail latency spikes | Key redesign | Repartition by load |
| Cache churn | Miss storms | TTL and key tuning | Multi-layer caching |
| Async backlog | Delayed downstream work | Worker scale-out | Priority queues |
| Dependency instability | Timeout cascades | Fail-fast budgets | Degraded fallback mode |
Metrics that should drive architecture evolution:
- p95 and p99 latency by operation.
- Error-budget burn by service and endpoint.
- Queue lag, retry volume, and dead-letter trends.
- Cache hit ratio by key family.
- Partition or shard utilization skew.
๐ High Level Design - Architecture for Functional Requirements
Building Blocks
- Post service, graph service, fan-out workers, timeline store, cache layer, and ranking service.
- API edge layer for authentication, authorization, and policy checks.
- Domain services for read and write responsibilities.
- Durable storage plus cache for fast retrieval and controlled consistency.
- Async event path for secondary processing and integrations.
Design the APIs
- Keep contracts explicit and version-friendly.
- Use idempotency keys for retriable writes.
- Return actionable error metadata for clients and retries.
Communication Between Components
- Synchronous path for user-visible confirmation.
- Asynchronous path for fan-out, indexing, notifications, and analytics.
Data Flow
- Post created -> fan-out queue -> timeline materialization -> cache read -> ranking -> response.
flowchart TD
A[Client or Producer] --> B[API and Policy Layer]
B --> C[Core Domain Service]
C --> D[Primary Data Store and Cache]
C --> E[Async Event or Job Queue]
D --> F[User-Facing Response]
E --> G[Workers and Integrations]
G --> H[State Update and Telemetry]
๐ Real-World Applications: API Mapping and Real-World Applications
This architecture pattern appears in real production systems because traffic is bursty, dependencies fail partially, and correctness requirements vary by operation type.
Practical API mapping examples:
- POST /resources for write operations with idempotency support.
- GET /resources/{id} for low-latency object retrieval.
- GET /resources?cursor= for scalable pagination and stable traversal.
- Async event emissions for indexing, notifications, and reporting.
Real-world system behavior is defined during failure, not normal operation. Good designs clearly specify what can be stale, what must be exact, and what should fail fast to preserve reliability.
โ๏ธ Trade-offs & Failure Modes (Design Deep Dive for Non Functional Requirements)
Scaling Strategy
- Scale stateless services horizontally behind load balancing.
- Partition stateful data by access-pattern-aware keys.
- Add queue-based buffering where write bursts exceed synchronous capacity.
Availability and Resilience
- Multi-instance deployment across failure domains.
- Replication and failover planning for stateful systems.
- Circuit breakers, retries with backoff, and bounded timeouts.
Storage and Caching
- Cache-aside for read-heavy access paths.
- Explicit invalidation and refresh policy.
- Tiered storage for hot, warm, and cold access profiles.
Consistency, Security, and Monitoring
- Clear strong vs eventual consistency contracts per operation.
- Authentication, authorization, and encryption in transit and at rest.
- Monitoring stack with metrics, logs, traces, SLO dashboards, and alerting.
This section is the architecture-for-NFRs view from your template. It explains how the system remains stable under scale, failures, and incident pressure.
๐งญ Decision Guide
| Situation | Recommendation |
| Early stage with moderate traffic | Keep architecture minimal and highly observable |
| Read-heavy workload dominates | Optimize cache and read model before complex rewrites |
| Write hotspots appear | Rework key strategy and partitioning plan |
| Incident frequency increases | Strengthen SLOs, runbooks, and fallback controls |
๐งช Practical Example for Interview Delivery
A repeatable way to deliver this design in interviews:
- Start with actors, use cases, and scope boundaries.
- State estimation assumptions (QPS, payload size, storage growth).
- Draw HLD and explain each component responsibility.
- Walk through one failure cascade and mitigation strategy.
- Describe phase-based evolution for 10x traffic.
Question-specific practical note:
- Use async fan-out where possible, keep recency cache hot, and isolate ranking from write-critical paths.
A concise closing sentence that works well: "I would launch with this minimal architecture, monitor p95 latency, error-budget burn, and queue lag, then scale the first saturated component before adding further complexity."
๐๏ธ Advanced Concepts for Production Evolution
When interviewers ask follow-up scaling questions, use a phased approach:
- Stabilize critical path dependencies with better observability.
- Increase throughput by isolating heavy side effects asynchronously.
- Reduce hotspot pressure through key redesign and repartitioning.
- Improve resilience using automated failover and tested runbooks.
- Expand to multi-region only when latency, compliance, or reliability targets require it.
This framing demonstrates that architecture decisions are tied to measurable outcomes, not architecture fashion trends.
๐ ๏ธ Spring Data Redis and Kafka: Fan-out Feed in Practice
Spring Data Redis provides a RedisTemplate with sorted set operations that map directly to the timeline store model โ ZADD with a recency score builds a per-user inbox, ZREVRANGE returns the top-N posts for feed reads.
@Service
public class FeedFanoutService {
private final RedisTemplate<String, String> redisTemplate;
private final FollowerGraphService graphService;
private final int MAX_FEED_SIZE = 800; // cap inbox size per user
// Called by Kafka consumer after a new post is published
public void fanoutToFollowers(Post post) {
ZSetOperations<String, String> zset = redisTemplate.opsForZSet();
double score = post.createdAt().toEpochMilli(); // recency score
List<String> followers = graphService.getFollowers(post.authorId());
// For accounts with fewer than 1000 followers: write-time fan-out
if (followers.size() <= 1000) {
for (String followerId : followers) {
String timelineKey = "feed:" + followerId;
zset.add(timelineKey, post.id(), score);
// Trim inbox to MAX_FEED_SIZE to bound memory growth
zset.removeRange(timelineKey, 0, -(MAX_FEED_SIZE + 1));
}
} else {
// For celebrity accounts: mark for read-time injection instead
redisTemplate.opsForSet().add("celebrity-posts", post.id());
}
}
// Feed read: merge pre-computed inbox + inject celebrity posts at read time
public List<String> getTimeline(String userId, int limit) {
String timelineKey = "feed:" + userId;
Set<String> inbox = redisTemplate.opsForZSet()
.reverseRange(timelineKey, 0, limit - 1);
// Celebrity post injection omitted for brevity
return new ArrayList<>(inbox);
}
}
ZADD feed:{userId} <epochMs> <postId> writes the post into the follower's sorted set inbox with a millisecond-resolution recency score. ZREVRANGE returns the most recent posts in O(log N + M) time. The hybrid threshold (1000 followers) separates write-time fan-out (fast reads, bounded fan-out cost) from read-time injection for celebrity accounts (cheap writes, slightly more complex read path).
Kafka handles the async fan-out pipeline: a PostCreatedEvent is produced to Kafka, and a consumer group of fan-out workers processes the event in parallel โ one worker per Kafka partition, each writing to the Redis sorted sets of its assigned follower slice.
For a full deep-dive on hybrid fan-out strategies and Redis sorted set timeline modeling, a dedicated follow-up post is planned.
๐ Lessons Learned
- Start with actors and use cases before drawing any diagram.
- Define in-scope and out-of-scope boundaries to prevent architecture sprawl.
- Convert NFRs into measurable SLO-style targets.
- Separate functional HLD from non-functional deep dive reasoning.
- Scale the first measured bottleneck, not the most visible component.
๐ TLDR: Summary & Key Takeaways
- Template-aligned answers are clearer, faster to evaluate, and easier to communicate.
- Good HLDs explain both request flow and state update flow.
- Non-functional architecture determines reliability under pressure.
- Phase-based evolution outperforms one-shot overengineering.
- Theory-linked reasoning improves consistency across different interview prompts.
๐ Practice Quiz
- Why should system design answers begin with actors and use cases?
A) To avoid architecture work entirely
B) To anchor architecture decisions to workload and user behavior
C) To skip non-functional requirements
Correct Answer: B
- Which section should define p95 and p99 targets?
A) Non Functional Requirements
B) Only the quiz section
C) Only the related posts section
Correct Answer: A
- What is the primary benefit of separating synchronous and asynchronous paths?
A) It removes all consistency trade-offs
B) It isolates latency-critical user flows from heavy side effects
C) It eliminates monitoring needs
Correct Answer: B
- Open-ended challenge: for this design, which component would you scale first at 10x traffic and which metric would you use to justify that decision?
๐ Related Posts

Written by
Abstract Algorithms
@abstractalgorithms
More Posts

Types of LLM Quantization: By Timing, Scope, and Mapping
TLDR: There is no single "best" LLM quantization. You classify and choose quantization along three axes: when you quantize (timing), what you quantize (scope), and how values are encoded (mapping). In
Stream Processing Pipeline Pattern: Stateful Real-Time Data Products
TLDR: Stream pipelines succeed when event-time semantics, state management, and replay strategy are designed together โ and Kafka Streams lets you build all three directly inside your Spring Boot service. Stripe's real-time fraud detection processes...
Service Mesh Pattern: Control Plane, Data Plane, and Zero-Trust Traffic
TLDR: A service mesh intercepts all service-to-service traffic via injected Envoy sidecar proxies, letting a platform team enforce mTLS, retries, timeouts, and circuit breaking centrally โ without changing application code. Reach for it when cross-te...
Serverless Architecture Pattern: Event-Driven Scale with Operational Guardrails
TLDR: Serverless is strongest for spiky asynchronous workloads when cold-start, observability, and state boundaries are intentionally designed. TLDR: Serverless works best for spiky, event-driven workloads when you design for idempotency, observabili...
