System Design: Caching and Asynchronism
Make your system fast and resilient. We explore Caching strategies (Write-Through, Write-Back) an...
Abstract AlgorithmsTLDR: Caching stores hot data in fast RAM so you skip slow database round-trips. Asynchronism moves slow tasks (email, video processing) off the critical path via message queues. Together, they turn a blocking, slow system into a responsive, scalable one.
๐ The Library Desk Analogy
Caching: The librarian copies the 5 most-requested books and keeps them on the front desk. You don't have to wait for a runner to fetch them from the stacks every time.
Asynchronism: When you request a rare book from off-site storage, the librarian gives you a ticket. "Come back in 2 hours." You don't stand at the desk waiting โ your visit is done the moment you get the ticket.
๐ข Read-Through vs. Cache-Aside: Two Caching Patterns
Cache-Aside (Lazy Population) โ Most Common
flowchart LR
App["Application"] -->|1. Check cache| Cache["Redis/Memcached"]
Cache -->|hit: return data| App
Cache -->|miss| DB["Database"]
DB -->|2. Fetch + populate cache| Cache
Flow:
- Check cache.
- If miss: query DB, write result to cache, return to caller.
- Next request: cache hit.
Pros: Only active data is cached. Simple to implement. Cons: First request always goes to DB (cold start). Stale data on cache-hit after DB update.
Write-Through
Every DB write also writes to cache. Cache is always in sync with DB.
Pros: No stale reads.
Cons: Every write pays the cache write cost regardless of whether the data will ever be read.
โ๏ธ Eviction Policies: How a Full Cache Chooses What to Drop
When cache capacity is full and a new item needs to be stored, the eviction policy decides what to remove:
| Policy | Rule | Best For |
| LRU (Least Recently Used) | Evict the item not accessed the longest | General-purpose; recency correlates with future use |
| LFU (Least Frequently Used) | Evict the item accessed fewest times overall | Stable hot-set workloads; viral posts that stay popular longer |
| FIFO (First In, First Out) | Evict the oldest item regardless of access | Time-sensitive data where age = staleness |
| TTL-based | Items expire after a configured duration | Session data, API rate limit counters |
LRU vs. LFU trade-off: A viral post accessed 10,000 times yesterday but nothing today would survive in LFU but be evicted by LRU. Conversely, a recently accessed item that's only accessed once a month would survive LRU unnecessarily.
๐ง Cache Invalidation Strategies
Cache invalidation is famously one of the two hard problems in CS ("There are only two hard things in computer science: cache invalidation and naming things" โ Phil Karlton).
| Strategy | When Cache Is Cleared | Risk |
| TTL expiry | After N seconds automatically | Stale window up to TTL seconds |
| Event-based invalidation | On DB update, explicitly DELETE cache_key | Consistency risk if DB update and cache delete are not atomic |
| Write-through | Cache always updated with DB | Higher write latency |
| Cache-aside with version key | Key includes version: user:42:v3 | Old versions ignored; ensures freshness |
For strong consistency requirements (e.g., inventory: never show "in stock" after last unit sold), use write-through + short TTL as a backstop.
โ๏ธ Asynchronism: Moving Slow Work Off the Critical Path
Operations that take more than ~100ms and don't need an immediate result should be async:
- Sending emails / SMSs.
- Resizing and storing uploaded images.
- Generating PDF reports.
- Updating search indexes.
- Sending webhooks.
Pattern: Message Queue (Producer โ Queue โ Consumer)
flowchart LR
API["API Handler\n(returns 202 Accepted immediately)"] -->|publish event| Queue["Message Queue\n(SQS / RabbitMQ / Kafka)"]
Queue --> Worker1["Worker 1\n(sends email)"]
Queue --> Worker2["Worker 2\n(resizes image)"]
Queue --> Worker3["Worker 3\n(updates search index)"]
The API responds 202 Accepted in milliseconds. Workers process in the background. If a worker crashes, the message remains in the queue and is retried.
Idempotency is required: Since queues guarantee at-least-once delivery, workers must be idempotent โ processing the same message twice must produce the same result. Use a deduplication ID (e.g., message UUID stored in DB) to handle duplicates.
โ๏ธ Caching Anti-Patterns to Avoid
| Anti-Pattern | Problem | Fix |
| Caching mutable user-specific data globally | User A sees User B's private data | Use user-scoped cache keys: user:{id}:profile |
| Infinite TTL | Stale data forever (schema changed, user updated profile) | Always set a TTL even if long (24โ48h) |
| Caching failures | Caching a DB error response; all users get error until TTL expires | Never cache error responses |
| Writing full objects to cache without compression | Large values consume cache memory; network pressure | Serialize and compress; consider partial caching |
๐ Summary
- Cache-Aside is the default pattern: read from cache, fall back to DB on miss, then populate.
- LRU is the standard eviction policy; use LFU for stable, long-lived hot data.
- Cache invalidation is hard: TTL expiry + event-based invalidation is the common hybrid.
- Async queues (SQS, RabbitMQ, Kafka) move slow work off the request path โ respond
202, process in background. - Idempotency is mandatory for queue consumers since at-least-once delivery is guaranteed.
๐ Practice Quiz
Your caching strategy shows that the first request after a cache miss always hits the database. Which pattern is this?
- A) Write-Through
- B) Cache-Aside (lazy population) โ cache is populated only when a miss occurs, so the first requester always hits the DB.
- C) Read-Through with a warm-up process
Answer: B
You have a social media platform. A celebrity's profile is accessed millions of times per day. Days later, they change their name. Which eviction policy would keep the celebrity profile in cache the longest?
- A) LRU โ it was accessed recently enough.
- B) LFU โ it has the highest access frequency and would outlast LRU if access drops temporarily.
- C) FIFO โ it was added to the cache early on.
Answer: B
Your email-sending worker processes a message, sends the email, but crashes before acknowledging the message. The queue re-delivers it. What property must the worker have to handle this safely?
- A) Atomicity โ the worker must wrap everything in a DB transaction.
- B) Idempotency โ processing the same message twice must produce the same result (e.g., use a message UUID to skip already-sent emails).
- C) Durability โ the worker must persist results to disk before acknowledging.
Answer: B

Written by
Abstract Algorithms
@abstractalgorithms
More Posts
SFT for LLMs: A Practical Guide to Supervised Fine-Tuning
TLDR: Supervised fine-tuning (SFT) is the stage where a pretrained model learns task-specific response behavior from curated input-output examples. It is usually the first alignment step after pretraining and often the foundation for later RLHF. Good...
RLHF in Practice: From Human Preferences to Better LLM Policies
TLDR: Reinforcement Learning from Human Feedback (RLHF) helps align language models with human preferences after pretraining and SFT. The typical pipeline is: collect preference comparisons, train a reward model, then optimize a policy (often with KL...
PEFT, LoRA, and QLoRA: A Practical Guide to Efficient LLM Fine-Tuning
TLDR: Full fine-tuning updates every model weight, which is expensive in memory, compute, and storage. PEFT methods update only a small trainable slice. LoRA learns low-rank adapters on top of frozen base weights. QLoRA pushes efficiency further by q...
LLM Model Naming Conventions: How to Read Names and Why They Matter
TLDR: LLM names encode practical decisions: model family, size, training stage, context window, format, and quantization level. If you can decode naming conventions, you can avoid costly deployment mistakes and choose the right checkpoint faster. ๏ฟฝ...
