All Posts

System Design Advanced: Security, Rate Limiting, and Reliability

How do you protect your API from hackers and traffic spikes? We cover Rate Limiting algorithms (T...

Abstract AlgorithmsAbstract Algorithms
ยทยท5 min read
Share
Share on X / Twitter
Share on LinkedIn
Copy link

TLDR: Three reliability tools every backend system needs: Rate Limiting prevents API spam and DDoS, Circuit Breakers stop cascading failures when downstream services degrade, and Bulkheads isolate failure blast radius. Knowing when and how to combine them separates junior from senior system design.


๐Ÿ“– The Electrical Panel Analogy

A house's electrical panel has three layers of protection:

  1. Fuse/breaker per circuit โ†’ no single appliance can knock out the house.
  2. Main breaker โ†’ kills everything if total load is too dangerous.
  3. Surge protector โ†’ absorbs voltage spikes before they reach appliances.

Distributed systems need the same layered defense โ€” at the API gateway, service-to-service, and individual thread pool level.


๐Ÿ”ข Rate Limiting: Controlling Inbound Traffic

Rate limiting is enforced at the API Gateway or reverse proxy layer before requests reach your application.

Token Bucket Algorithm

Each client gets a "bucket" of tokens. One token = one request. Tokens refill at a fixed rate.

Bucket capacity = 100 requests
Refill rate = 10 tokens/second
If tokens > 0: allow request, decrement token
If tokens == 0: return HTTP 429 Too Many Requests
AlgorithmBurst HandlingUse Case
Token BucketAllows small bursts up to bucket sizeAPI rate limits per user
Leaky BucketNo bursts โ€” constant output rateSmoothing traffic, QoS
Fixed WindowLarge bursts possible at window boundarySimple, low-overhead admin limits
Sliding WindowSmooth rate, no boundary spikesProduction API gateways (most common)

DDoS Defense: Layered Response

flowchart LR
    Internet["Internet Traffic"] --> CDN["CDN\n(absorb volumetric attacks)"]
    CDN --> WAF["WAF\n(block malicious patterns)"]
    WAF --> RL["Rate Limiter\n(per-IP / per-token limits)"]
    RL --> App["Application Servers"]
    RL -->|IP repeatedly violates| BH["Blackholing\n(drop to /dev/null)"]

Blackholing routes the attacker's traffic to a null interface โ€” no response, minimal server overhead. Used by ISPs and CDN providers against volumetric attacks.


โš™๏ธ Circuit Breaker: Stopping Cascading Failures

Without a circuit breaker:

  • Your API calls Service B. Service B is slow (DB overloaded).
  • All your threads block waiting for B.
  • Your thread pool fills up.
  • Your service is now also slow.
  • Services that call your service now slow down too.

This is a cascading failure โ€” one slow database takes down a chain of services.

Circuit Breaker states:

stateDiagram-v2
    [*] --> CLOSED : System healthy
    CLOSED --> OPEN : Error rate > threshold (e.g., 50% in 10s)
    OPEN --> HALF_OPEN : After timeout (e.g., 30s)
    HALF_OPEN --> CLOSED : Probe request succeeds
    HALF_OPEN --> OPEN : Probe request fails
StateBehaviorWhen
CLOSEDAll requests pass throughNormal operation
OPENAll requests fail fast (no actual call)After too many failures
HALF-OPENOne probe request allowedAfter recovery timeout

Implementation (Python with resilience4py pattern):

from circuitbreaker import circuit

@circuit(failure_threshold=5, recovery_timeout=30)
def call_payment_service(order_id: str):
    return requests.post(PAYMENT_URL, json={"order_id": order_id}, timeout=2)

When call_payment_service() fails 5 times within the threshold window, subsequent calls raise CircuitBreakerError immediately โ€” no actual network call, no blocked threads.


๐Ÿง  Bulkhead Pattern: Isolating Failure Blast Radius

Named after ship hull compartments โ€” if one compartment floods, the rest stay dry.

In software: give different traffic types separate thread pools and separate connection pools.

Critical Payments Thread Pool:    20 threads  (isolated)
Non-Critical Analytics Pool:      5 threads   (isolated)
Background Job Pool:              10 threads  (isolated)

If the analytics pool saturates, the payment pool is unaffected. Without bulkheads, all work shares one pool and one slow operation starves everything else.


โš–๏ธ When to Apply Each Pattern

ScenarioPattern
Public API with free and paid tiersRate Limiting (sliding window, per API key)
Microservice calling an unreliable external APICircuit Breaker
High-value transaction isolationBulkhead (dedicated thread + connection pool)
Protecting origin from DDoSCDN + WAF + Rate Limiter layered
Service-to-service timeout cascadeCircuit Breaker + timeout (aggressive: 500ms)
Queue consumer falling behindBackpressure (consumer signals producer to slow down)

๐Ÿ“Œ Summary

  • Token Bucket enforces per-client rate limits with allowance for small bursts.
  • Circuit Breaker (CLOSED โ†’ OPEN โ†’ HALF-OPEN) short-circuits failing calls before they cascade.
  • Bulkhead compartmentalizes thread/connection pools so slow operations can't starve critical paths.
  • DDoS defense is layered: CDN absorbs volume, WAF filters patterns, rate limiter blocks persistent abusers, blackholing drops the worst offenders.

๐Ÿ“ Practice Quiz

  1. Service A calls Service B. Service B's database is overloaded. Without a circuit breaker, what happens to Service A?

    • A) Service A automatically retries until B recovers.
    • B) Service A's threads block on B's slow responses, fill its own thread pool, and Service A becomes slow too โ€” cascading failure.
    • C) Service A returns a cached response automatically.
      Answer: B
  2. A Circuit Breaker is in OPEN state. What happens when a new request arrives?

    • A) The request is queued until the circuit closes.
    • B) The request fails immediately without attempting the actual call โ€” protecting the upstream service from being slammed while it recovers.
    • C) The request is retried with exponential backoff.
      Answer: B
  3. Your payment endpoint and reporting endpoint share the same global thread pool (50 threads). Reporting queries run long. What is the correct fix?

    • A) Increase the thread pool to 200 threads.
    • B) Apply the Bulkhead pattern โ€” give payment and reporting isolated pools so long-running reports can't starve payment threads.
    • C) Add a Circuit Breaker on the reporting endpoint.
      Answer: B

Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms