System Design HLD Example: Video Streaming (YouTube/Netflix)
A practical interview-ready HLD for a video streaming platform with adaptive bitrate and CDN delivery.
Abstract AlgorithmsTLDR: A video streaming platform is a two-sided architectural beast: a batch-oriented transcoding pipeline that converts raw uploads into multi-resolution segments, and a real-time global delivery network that serves those segments via CDNs. The technical linchpin is Adaptive Bitrate Streaming (ABR), which enables the client player to seamlessly switch quality based on network fluctuations, ensuring a buffer-free experience for millions of concurrent users.
πΊ The "Game of Thrones" Crash
Itβs Sunday night at 9:00 PM. Millions of people simultaneously hit "Play" on the season finale of the world's most popular show. For the first five minutes, everything is fine. Then, the Twitter complaints start: "Buffering...", "Pixelated mess!", "Server error 500."
Behind the scenes, the origin servers are melting. Every edge node in the Content Delivery Network (CDN) is trying to fetch the same 4K video segment from the central storage at once. This is the Cache Thundering Herd problem. If your system is designed to serve a few thousand users, it will crumble when a viral event spikes traffic by 100x in seconds.
A video platform isn't just a website; it's a global distribution engine. The challenge isn't just storing the bytesβit's moving those bytes across the world's oceans and through congested ISP networks to a smartphone on a shaky 3G connection, all without a single "Buffering" spinner. At the scale of Netflix or YouTube, you aren't just optimizing code; you are optimizing the physics of data movement.
π Video Streaming: Use Cases & Requirements
Actors & Journeys
- Content Creator: Uploads high-quality raw video files (often $>100$ GB). They require a reliable, resumable upload path.
- Viewer: Consumes content across diverse devices (4K TV, 720p Laptop, 360p Smartphone). They require "instant-on" playback and no buffering.
- Platform Admin: Manages content moderation, transcoding priorities, and CDN cost-efficiency.
In/Out Scope
- In-Scope: Video ingestion (upload), distributed transcoding, segment storage, global delivery via CDN, and adaptive bitrate logic.
- Out-of-Scope: Content recommendation engines (AI/ML), complex copyright management (DMCA takedown workflows), and live interactive chat.
Functional Requirements
- Multipart Upload: Support for uploading large files with resume capability.
- Automated Transcoding: Convert raw video into multiple resolutions (360p, 720p, 1080p, 4K) and streaming formats (HLS, DASH).
- Adaptive Playback: Seamlessly serve the best possible quality based on the user's real-time bandwidth.
- Metadata Management: Searchable titles, descriptions, thumbnails, and view counts.
Non-Functional Requirements (NFRs)
- High Availability: 99.99% for the playback path (viewers shouldn't know if the uploader is down).
- Ultra-Low Latency: Playback startup should be $< 2$ seconds globally.
- Massive Scalability: Handle 500 hours of video uploaded per minute and 1 billion views per day.
- Cost Efficiency: Optimize storage and egress bandwidth (the single largest expense).
π Foundations: How Video Streaming Actually Works
Unlike a simple file download where you wait for the whole file to arrive, modern streaming uses Segmented Delivery.
The baseline architecture involves three main pillars:
- The Bitrate Ladder: A single video is converted into multiple files with different resolutions and bitrates.
- Segmentation: Each of these files is sliced into 2-10 second "chunks" or "segments."
- The Manifest: A text file (like
.m3u8for HLS) that tells the player where to find these segments.
When you press "Play," you aren't downloading movie.mp4. You are downloading manifest.m3u8, which points to segment_1_1080p.ts, then segment_2_1080p.ts, and so on. This architecture allows the player to jump to any part of the video instantly by just requesting the relevant segment, and it's the foundation for adaptive quality.
βοΈ The Mechanics of Adaptive Bitrate (ABR)
The mechanism of quality switching happens entirely on the Client Side.
- HLS (HTTP Live Streaming): Developed by Apple, it uses
.tssegments and is the standard for iOS/Safari. - DASH (Dynamic Adaptive Streaming over HTTP): An international standard that is more flexible and widely used on Android and Smart TVs.
- The Switching Logic: The player maintains a "Buffer Health" counter (e.g., 20 seconds of video pre-downloaded). If the download speed of the last segment was slower than the segment's duration, the player switches to a lower bitrate rendition in the bitrate ladder for the next segment to prevent the buffer from hitting zero.
π Estimations & Design Goals
The Math of YouTube-Scale
- Ingest Volume: 500 hours/min = 30,000 hours/hour.
- Raw Storage: 30,000 hours $\times$ 5 GB/hour (high-bitrate 1080p) = 150 TB/hour.
- Transcoding Expansion: Each video is transcoded into $\approx 6$ resolutions. Total storage after processing $\approx 3\times$ the raw size.
- Egress Bandwidth: 1B views/day. If average view is 10 mins at 2 Mbps:
- $1B \times 600s \times 2Mbps / 8 \text{ (bits to bytes)} = \mathbf{150 \text{ Petabytes/day}}$ of egress traffic.
Design Goals
- 99% Cache Hit Rate: The origin server should almost never see a request from a user; the CDN must handle the load.
- Parallel Processing: Transcoding must be chunked to allow a 1-hour video to be processed in $< 10$ minutes.
- Cold vs. Hot Storage: Move rarely watched videos to S3 Glacier to save costs.
π High-Level Design: The Twin Pipeline Architecture
The architecture is split into a Write Path (Ingestion & Processing) and a Read Path (Discovery & Delivery).
graph TD
Creator -->|Upload| LB[Load Balancer]
LB --> US[Upload Service]
US --> Raw[(Raw Store: S3)]
Raw --> MQ[Message Queue: Kafka]
MQ --> TW[Transcoding Workers]
TW --> Segs[(Segment Store: S3)]
Segs --> OS[Origin Shield]
OS --> CDN[Global CDN Nodes]
Viewer -->|Metadata| API[API Gateway]
Viewer -->|Stream| CDN
API --> DB[(Metadata DB: Postgres)]
API --> Cache[(Redis Cache)]
Explanation of the Architecture: The architecture uses a Decoupled Ingestion Pipeline. The Upload Service receives raw video and puts it into an S3 Raw Store. A Kafka event then triggers a pool of Transcoding Workers that perform the heavy computation of encoding and segmenting. The results are stored in a secondary S3 bucket. To prevent a "Thundering Herd" on S3, an Origin Shield acts as a mid-tier cache between the Global CDNs and the S3 storage. Viewers interact with a lightweight API for metadata but pull the heavy video bytes directly from the CDN edge.
π API Design: The Playback Contract
While the streaming itself is handled via manifest files, the orchestration requires a robust API.
| Endpoint | Method | Payload | Description |
/v1/videos/upload-session | POST | {"file_name": "vid.mp4", "size": 5000000} | Initialize a multipart upload. Returns session_id. |
/v1/videos/{id}/metadata | GET | N/A | Get video title, owner, and the Manifest URL (.m3u8). |
/v1/videos/{id}/stats | POST | {"view_time": 45, "device": "mobile"} | Heartbeat to update view counts and analytics. |
/v1/videos/{id}/thumbnail | GET | N/A | Fetch the poster image for the video player. |
ποΈ Data Model: Schema Definitions
Metadata Store (PostgreSQL)
Used for structured data requiring ACID properties, like video ownership, permissions, and status.
| Table | Column | Type | Notes |
videos | id | UUID (PK) | Unique identifier. |
videos | uploader_id | UUID | FK to Users table. |
videos | manifest_path | TEXT | Path to the HLS master playlist in S3. |
videos | status | ENUM | PENDING, TRANSCODING, READY, ERROR. |
videos | duration | INT | Duration in seconds. |
Transcoding State (Apache Cassandra)
We use Cassandra to track the millions of tiny "Chunk" jobs in the transcoding DAG because of its high write throughput.
| Table | Partition Key | Clustering Key |
job_status | video_id | chunk_id + resolution |
π§ Tech Stack & Design Choices
| Component | Choice | Rationale |
| Object Storage | AWS S3 | Unrivaled durability and horizontal scale for petabytes of blobs. |
| Message Queue | Apache Kafka | Handles high-volume event streams for transcoding orchestration. |
| Transcoding Engine | FFmpeg | The industry standard for low-level video/audio manipulation. |
| Metadata DB | PostgreSQL (RDS) | Strong consistency for user-facing metadata and billing. |
| CDN | Akamai / CloudFront | Global edge network with origin shield capabilities. |
| In-Memory Cache | Redis | Caches manifests and metadata to hit sub-100ms startup times. |
π§ Design Deep Dive
π‘οΈ Internals: The Parallel Transcoding Pipeline
Transcoding a 2-hour 4K movie as one contiguous file would take 10+ hours and be prone to failure. Instead, we use a Chunk-based DAG:
- Splitting: The raw file is sliced into 10-second GOP (Group of Pictures) aligned chunks.
- Parallel Fan-out: Each chunk is sent to a pool of workers. Worker A processes Chunk 1 at 1080p, while Worker B processes Chunk 1 at 720p.
- Stitching & Manifesting: Once all chunks for a rendition are finished, a manifest generator creates the
.m3u8file.
This allows us to process a 2-hour movie in roughly 5 minutes by throwing 500 workers at it. If one worker fails, we only re-process a 10-second chunk, not the whole movie.
π Performance Analysis: The CDN Origin Shield
When a video goes viral, thousands of CDN edge nodes will simultaneously miss their cache and try to fetch the same segment from S3. This is the Thundering Herd.
- The Solution: We implement an Origin Shield (a mid-tier cache).
- The Flow: Edge Nodes $\rightarrow$ Regional Shield $\rightarrow$ S3.
- The Impact: If 1,000 Edge nodes in Europe need
segment_42.ts, they all hit the London Origin Shield. The Shield fetches from S3 once and serves the other 999 requests from its own cache. This reduces origin egress costs by $95\%+$ and prevents S3 API rate-limiting. - SLO: Median Playback Startup $< 1s$. Re-buffering rate $< 0.5\%$.
π Real-World Applications
Video streaming technology powers more than just entertainment:
- Streaming Platforms: YouTube, Netflix, Disney+, and Twitch.
- Security Systems: Nest and Ring use segmented storage for cloud DVR.
- Corporate Training: Platforms like Coursera and internal company town halls.
- Social Media: TikTok and Instagram Reels (optimized for ultra-short segments and fast looping).
βοΈ Trade-offs & Failure Modes
- Storage vs. Compute: We can transcode into 20 different resolutions to save user bandwidth, but that costs more storage. We trade off "per-title encoding" for "one-size-fits-all" based on video popularity.
- Latency vs. Quality: In live streaming (Twitch), we use smaller 2-second segments to reduce latency, but this increases the risk of buffering.
- Failure Mode: Transcoding Backlog. If a viral event causes a massive upload spike, the Kafka queue grows. We prioritize "Premium" users or "Trending" videos in the transcoding queue.
- Failure Mode: CDN Outage. We use a Multi-CDN strategy. If Akamai fails, the manifest URL points to CloudFront as a fallback.
ποΈ Advanced Concepts for Production Evolution
- Adaptive Bitrate Streaming (ABR): The player monitors the "Buffer Health." If the download speed drops (e.g., user enters an elevator), the player automatically requests the next 4-second segment from the 480p folder instead of 1080p.
- Per-Title Encoding: Not all videos are equal. An animation (flat colors) needs a lower bitrate than an action movie. Platforms like Netflix analyze each video to create a custom "Bitrate Ladder."
- Pre-warming CDNs: Based on user behavior and "Recommended" feeds, the system proactively pushes the first 10 seconds of likely-to-be-watched videos to the edge.
- DRM (Digital Rights Management): For premium content, segments are encrypted. The player must fetch a temporary decryption key from a License Service using Widevine or FairPlay.
π§ Decision Guide
| Situation | Recommendation |
| Low Latency (Live) | Use 2-second segments and LL-HLS. Accept lower quality. |
| High Quality (VOD) | Use 10-second segments and Multi-pass encoding. |
| Startup (MVP) | Use a 3rd party service like Mux or AWS Elemental to avoid building the transcoding engine. |
| Global Scale | Implement an Origin Shield and Multi-CDN immediately. |
π§ͺ Practical Example: Interview Delivery
If asked to design this in an interview, focus on these three beats:
- Define the Ingest vs. Delivery split: Explain that they have completely different scaling requirements (Write-heavy vs. Read-heavy).
- The "Chunking" Strategy: Explain why you don't transcode the whole file at once. It shows you understand distributed processing and fault tolerance.
- The CDN is the MVP: Discuss how to prevent your S3 bucket from being a bottleneck using Origin Shields.
Standard Interview Closer: "I designed this system with a parallelized transcoding pipeline to ensure fast time-to-market for creators. On the delivery side, I prioritized availability and latency by using a multi-tiered CDN architecture with ABR support, ensuring that even under viral load, the origin remains protected and the user experience remains smooth."
π οΈ FFmpeg: How It Works in Practice
FFmpeg is the core engine for most video platforms.
Example: Generating an HLS Stream
A typical transcoding worker would run a command like this to generate 4-second segments for a 720p rendition:
# Generate HLS segments and manifest
ffmpeg -i input_raw.mp4 \
-c:v libx264 -b:v 2500k \
-s 1280x720 -g 48 \
-f hls -hls_time 4 -hls_playlist_type vod \
-hls_segment_filename "output/720p/seg_%03d.ts" \
output/720p/index.m3u8
In a Java-based worker environment, you would use a process builder or a cloud wrapper:
public void startTranscodingJob(VideoJob job) {
// 1. Download raw chunk from S3
// 2. Execute FFmpeg process with resolution parameters
// 3. Upload .ts segments and .m3u8 manifest back to S3
// 4. Update the 'Job_Status' table in Cassandra to 'COMPLETED'
}
π Lessons Learned
- Large Files are Liabilites: Never try to process a large video as a single unit. Chunking is mandatory for reliability.
- Egress is Expensive: Bandwidth costs more than storage. Compute is cheap; use it to optimize your bitrates.
- View Counts are "Fuzzy": Don't try to get 100% accurate view counts in real-time. Use Redis-based buffering and accept eventual consistency to keep the playback path fast.
π Summary & Key Takeaways
- Transcoding: The process of turning one raw file into many web-friendly versions.
- ABR: The client-side logic that picks the best quality on-the-fly based on network health.
- CDN Origin Shield: Mandatory for protecting your backend from viral traffic spikes.
- Segments over Files: Modern streaming is about downloading thousands of small files, not one big one.
π Practice Quiz
What is the primary purpose of an "Origin Shield" in a video platform?
- A) To encrypt the video content for DRM.
- B) To act as a mid-tier cache and protect S3 from "Thundering Herd" requests.
- C) To transcode videos into lower resolutions. Correct Answer: B
In HLS streaming, what is the role of the
.m3u8file?- A) It is the actual video data.
- B) It is a manifest file that lists the locations of the video segments.
- C) It is the encryption key for the video. Correct Answer: B
Why do we transcode a single video into multiple resolutions?
- A) To make the video look better on 4K TVs only.
- B) To support Adaptive Bitrate (ABR) so users with slow internet can still watch at lower quality.
- C) Because browsers cannot play 1080p video directly. Correct Answer: B
[Open-ended] Describe how you would implement a "Resume Upload" feature for a 100GB video file. What metadata do you need to track on the server?
π Related Posts

Written by
Abstract Algorithms
@abstractalgorithms
More Posts

Adapting to Virtual Threads for Spring Developers
TLDR: Platform threads (one OS thread per request) max out at a few hundred concurrent I/O-bound requests. Virtual threads (JDK 21+) allow millions β with zero I/O-blocking cost. Spring Boot 3.2 enables them with a single property. Avoid synchronized...

Java 8 to Java 25: How Java Evolved from Boilerplate to a Modern Language
TLDR: Java went from the most verbose mainstream language to one of the most expressive. Lambdas killed anonymous inner classes. Records killed POJOs. Virtual threads killed thread pools for I/O work.
Data Anomalies in Distributed Systems: Split Brain, Clock Skew, Stale Reads, and More
TLDR: Distributed systems produce anomalies not because the code is buggy β but because physics makes it impossible to be perfectly consistent, available, and partition-tolerant simultaneously. Split brain, stale reads, clock skew, causality violatio...
Sharding Approaches in SQL and NoSQL: Range, Hash, and Directory-Based Strategies Compared
TLDR: Sharding splits your database across multiple physical nodes so no single machine carries all the data or absorbs all the writes. The strategy you choose β range, hash, consistent hashing, or directory β determines whether range queries stay ch...
