System Design Advanced: Security, Rate Limiting, and Reliability
How do you protect your API from hackers and traffic spikes? We cover Rate Limiting algorithms (T...
Abstract AlgorithmsTLDR: Three reliability tools every backend system needs: Rate Limiting prevents API spam and DDoS, Circuit Breakers stop cascading failures when downstream services degrade, and Bulkheads isolate failure blast radius. Knowing when and how to combine them separates junior from senior system design.
๐ The Electrical Panel Analogy
A house's electrical panel has three layers of protection:
- Fuse/breaker per circuit โ no single appliance can knock out the house.
- Main breaker โ kills everything if total load is too dangerous.
- Surge protector โ absorbs voltage spikes before they reach appliances.
Distributed systems need the same layered defense โ at the API gateway, service-to-service, and individual thread pool level.
๐ข Rate Limiting: Controlling Inbound Traffic
Rate limiting is enforced at the API Gateway or reverse proxy layer before requests reach your application.
Token Bucket Algorithm
Each client gets a "bucket" of tokens. One token = one request. Tokens refill at a fixed rate.
Bucket capacity = 100 requests
Refill rate = 10 tokens/second
If tokens > 0: allow request, decrement token
If tokens == 0: return HTTP 429 Too Many Requests
| Algorithm | Burst Handling | Use Case |
| Token Bucket | Allows small bursts up to bucket size | API rate limits per user |
| Leaky Bucket | No bursts โ constant output rate | Smoothing traffic, QoS |
| Fixed Window | Large bursts possible at window boundary | Simple, low-overhead admin limits |
| Sliding Window | Smooth rate, no boundary spikes | Production API gateways (most common) |
DDoS Defense: Layered Response
flowchart LR
Internet["Internet Traffic"] --> CDN["CDN\n(absorb volumetric attacks)"]
CDN --> WAF["WAF\n(block malicious patterns)"]
WAF --> RL["Rate Limiter\n(per-IP / per-token limits)"]
RL --> App["Application Servers"]
RL -->|IP repeatedly violates| BH["Blackholing\n(drop to /dev/null)"]
Blackholing routes the attacker's traffic to a null interface โ no response, minimal server overhead. Used by ISPs and CDN providers against volumetric attacks.
โ๏ธ Circuit Breaker: Stopping Cascading Failures
Without a circuit breaker:
- Your API calls Service B. Service B is slow (DB overloaded).
- All your threads block waiting for B.
- Your thread pool fills up.
- Your service is now also slow.
- Services that call your service now slow down too.
This is a cascading failure โ one slow database takes down a chain of services.
Circuit Breaker states:
stateDiagram-v2
[*] --> CLOSED : System healthy
CLOSED --> OPEN : Error rate > threshold (e.g., 50% in 10s)
OPEN --> HALF_OPEN : After timeout (e.g., 30s)
HALF_OPEN --> CLOSED : Probe request succeeds
HALF_OPEN --> OPEN : Probe request fails
| State | Behavior | When |
| CLOSED | All requests pass through | Normal operation |
| OPEN | All requests fail fast (no actual call) | After too many failures |
| HALF-OPEN | One probe request allowed | After recovery timeout |
Implementation (Python with resilience4py pattern):
from circuitbreaker import circuit
@circuit(failure_threshold=5, recovery_timeout=30)
def call_payment_service(order_id: str):
return requests.post(PAYMENT_URL, json={"order_id": order_id}, timeout=2)
When call_payment_service() fails 5 times within the threshold window, subsequent calls raise CircuitBreakerError immediately โ no actual network call, no blocked threads.
๐ง Bulkhead Pattern: Isolating Failure Blast Radius
Named after ship hull compartments โ if one compartment floods, the rest stay dry.
In software: give different traffic types separate thread pools and separate connection pools.
Critical Payments Thread Pool: 20 threads (isolated)
Non-Critical Analytics Pool: 5 threads (isolated)
Background Job Pool: 10 threads (isolated)
If the analytics pool saturates, the payment pool is unaffected. Without bulkheads, all work shares one pool and one slow operation starves everything else.
โ๏ธ When to Apply Each Pattern
| Scenario | Pattern |
| Public API with free and paid tiers | Rate Limiting (sliding window, per API key) |
| Microservice calling an unreliable external API | Circuit Breaker |
| High-value transaction isolation | Bulkhead (dedicated thread + connection pool) |
| Protecting origin from DDoS | CDN + WAF + Rate Limiter layered |
| Service-to-service timeout cascade | Circuit Breaker + timeout (aggressive: 500ms) |
| Queue consumer falling behind | Backpressure (consumer signals producer to slow down) |
๐ Summary
- Token Bucket enforces per-client rate limits with allowance for small bursts.
- Circuit Breaker (CLOSED โ OPEN โ HALF-OPEN) short-circuits failing calls before they cascade.
- Bulkhead compartmentalizes thread/connection pools so slow operations can't starve critical paths.
- DDoS defense is layered: CDN absorbs volume, WAF filters patterns, rate limiter blocks persistent abusers, blackholing drops the worst offenders.
๐ Practice Quiz
Service A calls Service B. Service B's database is overloaded. Without a circuit breaker, what happens to Service A?
- A) Service A automatically retries until B recovers.
- B) Service A's threads block on B's slow responses, fill its own thread pool, and Service A becomes slow too โ cascading failure.
- C) Service A returns a cached response automatically.
Answer: B
A Circuit Breaker is in OPEN state. What happens when a new request arrives?
- A) The request is queued until the circuit closes.
- B) The request fails immediately without attempting the actual call โ protecting the upstream service from being slammed while it recovers.
- C) The request is retried with exponential backoff.
Answer: B
Your payment endpoint and reporting endpoint share the same global thread pool (50 threads). Reporting queries run long. What is the correct fix?
- A) Increase the thread pool to 200 threads.
- B) Apply the Bulkhead pattern โ give payment and reporting isolated pools so long-running reports can't starve payment threads.
- C) Add a Circuit Breaker on the reporting endpoint.
Answer: B

Written by
Abstract Algorithms
@abstractalgorithms
More Posts
SFT for LLMs: A Practical Guide to Supervised Fine-Tuning
TLDR: Supervised fine-tuning (SFT) is the stage where a pretrained model learns task-specific response behavior from curated input-output examples. It is usually the first alignment step after pretraining and often the foundation for later RLHF. Good...
RLHF in Practice: From Human Preferences to Better LLM Policies
TLDR: Reinforcement Learning from Human Feedback (RLHF) helps align language models with human preferences after pretraining and SFT. The typical pipeline is: collect preference comparisons, train a reward model, then optimize a policy (often with KL...
PEFT, LoRA, and QLoRA: A Practical Guide to Efficient LLM Fine-Tuning
TLDR: Full fine-tuning updates every model weight, which is expensive in memory, compute, and storage. PEFT methods update only a small trainable slice. LoRA learns low-rank adapters on top of frozen base weights. QLoRA pushes efficiency further by q...
LLM Model Naming Conventions: How to Read Names and Why They Matter
TLDR: LLM names encode practical decisions: model family, size, training stage, context window, format, and quantization level. If you can decode naming conventions, you can avoid costly deployment mistakes and choose the right checkpoint faster. ๏ฟฝ...
