Webhooks Explained: Don't Call Us, We'll Call You
Polling is slow and wasteful. Webhooks are event-driven callbacks that deliver data the moment something happens.
Executive TLDR
- TLDR: Webhooks let one system push event data to another the moment something happens.
- Instead of polling ("anything new?"), you expose an endpoint and the provider POSTs signed event payloads to you in near real time.
- The key production requirements: signature verification, idempotency, async processing.
- π The Basics: HTTP Callbacks and Event Driven Delivery A webhook is an HTTP callback.
Core mental model
Read this as a system of state, constraints, and failure boundaries.
Polling is slow and wasteful. Webhooks are event-driven callbacks that deliver data the moment something happens.
Key systems visualization
The articleβs conceptual path
01
π The Basics: HTTP Callbacks and Event-Driven Delivery
02
π Stop Polling β Let the Provider Ring Your Doorbell
03
π’ What a Webhook Payload Looks Like
04
π Webhook Delivery Flow
05
βοΈ The Production-Safe Webhook Handler
TLDR: Webhooks let one system push event data to another the moment something happens. Instead of polling ("anything new?"), you expose an endpoint and the provider POSTs signed event payloads to you in near real-time. The key production requirements: signature verification, idempotency, async processing.
π The Basics: HTTP Callbacks and Event-Driven Delivery
A webhook is an HTTP callback. When a specific event occurs, the event source (provider) sends an HTTP POST request to a URL you specify β your webhook endpoint. No polling, no long connections.
The three-step registration flow:
- You register a URL with the provider (e.g.,
https://your-app.com/webhooks/stripe). - An event occurs on the provider's system (a payment succeeds, a commit is pushed).
- The provider sends a POST with a JSON payload describing the event to your URL.
Your app receives the event nearly instantly β no repeated requests, no idle network traffic.
What makes webhooks different from REST APIs:
| Dimension | REST API (polling) | Webhook |
| Who initiates? | Your app | The provider |
| Latency | Up to poll interval | Near real-time |
| Network cost at idle | Constant β requests every N seconds | Zero |
| Reliability | Deterministic | Depends on provider retry policy |
| Scalability | Linear with frequency | Driven by event rate |
The biggest practical implication: with webhooks, you never pay network and compute cost for silence. You only receive traffic when something actually happened.
π Stop Polling β Let the Provider Ring Your Doorbell
Polling means your app repeatedly asks a provider "anything new?" β burning bandwidth and adding latency whether or not anything changed.
Webhooks invert the call: you register a URL, and the provider calls your endpoint the instant an event occurs.
| Model | Who initiates? | Event latency | Network cost at idle |
| Polling | Your app | Up to poll interval | High (constant requests) |
| Webhook | Provider | Near real-time | Very low |
Analogy: Polling is calling the courier every 5 minutes. A webhook is the courier ringing your doorbell the moment the package arrives.
π’ What a Webhook Payload Looks Like
Most providers send a structured JSON body over HTTPS POST:
{
"id": "evt_101",
"type": "payment.succeeded",
"created": 1772877602,
"data": {
"transaction_id": "txn_9001",
"amount": 4999,
"currency": "USD"
}
}
Key fields in every real-world webhook:
idβ unique event identifier; use for deduplication.typeβ event name; drives routing in your handler.createdβ Unix timestamp; enables replay-window validation.dataβ the event payload; schema varies by event type.
π Webhook Delivery Flow
The complete lifecycle from event to processed business logic:
flowchart TD
A[Event occurs at Provider] --> B[Provider signs payload with HMAC]
B --> C[Provider POSTs to your endpoint over HTTPS]
C --> D{Signature valid?}
D -->|No| E[Return 401 reject]
D -->|Yes| F{Event ID already seen?}
F -->|Yes| G[Return 200 silently ignore]
F -->|No| H[Persist raw payload to DB]
H --> I[Enqueue job for async worker]
I --> J[Return 200 immediately]
J --> K[Worker executes business logic]
K --> L[Mark event as processed]
The critical design principle: your endpoint's only job is to acknowledge receipt (return 200) as fast as possible. All actual work happens asynchronously. This prevents provider retry storms.
Provider retry schedules (common defaults):
| Provider | Retry count | Retry interval strategy |
| Stripe | Up to 3 days | Exponential backoff |
| GitHub | Up to 3 attempts | Immediate, then 1h, 2h |
| Twilio | Up to 11 hours | Exponential |
| Shopify | Up to 48 hours | Exponential backoff |
Providers consider a delivery successful only when they receive a 2xx response within their timeout window (usually 5β30 seconds).
βοΈ The Production-Safe Webhook Handler
A naive endpoint that just processes inline is dangerous: it creates duplicate actions when providers retry. The correct pattern has 5 ordered steps:
flowchart TD
A[Provider POST] --> B{Valid HMAC
signature?}
B -->|No| C[Return 401]
B -->|Yes| D{Duplicate
event_id?}
D -->|Yes| E[Return 200 ignore]
D -->|No| F[Persist event
to durable store]
F --> G[Enqueue for async
worker]
G --> H[Return 200 immediately]
H --> I[Worker processes
business logic]
// Node.js / Express β production-safe handler skeleton
app.post('/webhooks/provider', express.raw({ type: 'application/json' }), (req, res) => {
const sig = req.header('X-Signature');
if (!isValidHmac(req.body, sig, process.env.WEBHOOK_SECRET)) {
return res.status(401).send('invalid signature');
}
const event = JSON.parse(req.body.toString('utf8'));
if (isDuplicate(event.id)) {
return res.status(200).send('duplicate ignored');
}
persistEvent(event); // write to DB before ack
enqueueEvent(event); // hand off to async worker
return res.status(200).send('accepted');
});
Why return 200 before processing? Most providers retry if they don't receive a timely 2xx. If your business logic runs inline and takes too long, the same event fires twice.
π§ Deep Dive: Why At-Least-Once Delivery Demands Idempotency
Webhook providers guarantee delivery by retrying on failure β not by ensuring exactly-once. A network timeout after your handler processes an event but before returning 200 causes the same event to arrive again. Idempotency means processing an event twice produces the same result as once. The key: store event.id on first receipt and reject any event whose ID already exists in your store before executing any business logic.
π Real-World Applications: Where Webhooks Power Real Systems
| Domain | Provider | Event examples |
| Payments | Stripe, PayPal | payment.succeeded, refund.created, dispute.opened |
| CI/CD | GitHub, GitLab | push, pull_request, deployment_status |
| Customer messaging | Twilio, Slack | message.received, channel.created |
| SaaS integrations | HubSpot, Salesforce | contact.created, deal.updated |
| Infrastructure | PagerDuty, Datadog | alert.triggered, incident.resolved |
βοΈ Trade-offs & Failure Modes: Failure Modes You Must Defend Against
Most webhook providers use at-least-once delivery β duplicates are normal, ordering is not guaranteed.
| Failure mode | Symptom | Root cause | Fix |
| Duplicate processing | Double charge or duplicate action | Provider retry after network timeout | Idempotency key on event.id |
| Signature failure spike | Many 401 responses | Secret mismatch or clock drift | Secret rotation with overlap window + NTP |
| Queue backlog | Delayed domain updates | Worker under-capacity | Autoscale workers; backpressure control |
| Silent data loss | Missing domain updates | Returned 200 before persisting | Persist first, then ack |
| Replay storm | Millions of old events flood handler | Misconfigured replay | Timestamp window validation (reject events > 5 min old) |
π Webhook Delivery Sequence
sequenceDiagram
participant PS as Provider System
participant WH as Webhook Endpoint
participant Q as Job Queue
participant W as Worker
PS->>PS: event occurs (payment.succeeded)
PS->>WH: HTTP POST signed payload
WH->>WH: validate HMAC signature
WH->>WH: check idempotency (event.id seen?)
WH->>Q: enqueue job
WH-->>PS: 200 OK (fast ack)
Q->>W: dispatch job
W->>W: execute business logic
The sequence above traces the complete lifecycle of a single webhook event from provider to worker. Notice the three critical checkpoints in the endpoint handler β signature validation, idempotency check, and job enqueue β all executed before returning the fast 200 OK acknowledgement. The key takeaway is that business logic never runs inside the HTTP handler itself; it is always delegated to an async worker, keeping the endpoint fast and the provider's retry counter at zero.
π Webhook Retry on Failure
sequenceDiagram
participant P as Provider
participant E as Endpoint
P->>E: HTTP POST (attempt 1)
E-->>P: 500 Internal Server Error
Note over P: wait 30s (exponential backoff)
P->>E: HTTP POST (attempt 2)
E-->>P: 500 Internal Server Error
Note over P: wait 60s
P->>E: HTTP POST (attempt 3)
E-->>P: 200 OK
Note over P: delivery confirmed
Estimating worker capacity:
$$ ext{workers needed} pprox \lambda \cdot T$$
where $\lambda$ = incoming events/sec, $T$ = average processing time (sec). Add 2Γ safety margin for burst.
π§ͺ Practical: Setting Up and Testing Webhooks
Local development with a tunnel:
Production webhook endpoints must be publicly reachable over HTTPS. During development, use a tunnel to expose your local server:
# Using ngrok β creates a public HTTPS URL for localhost:3000
ngrok http 3000
# Forwarding: https://abc123.ngrok.io -> http://localhost:3000
Register the ngrok URL with your provider's webhook settings. Events now flow to your local development server.
Testing your handler:
# Replay a real event from Stripe's dashboard
stripe events resend evt_1234567890
# Trigger a test event via CLI
stripe trigger payment_intent.succeeded
Verifying your HMAC implementation is correct:
const crypto = require('crypto');
function isValidHmac(rawBody, signatureHeader, secret) {
const expected = crypto
.createHmac('sha256', secret)
.update(rawBody) // raw Buffer, not parsed JSON
.digest('hex');
// Use timingSafeEqual to prevent timing attacks
return crypto.timingSafeEqual(
Buffer.from(expected, 'hex'),
Buffer.from(signatureHeader.replace('sha256=', ''), 'hex')
);
}
Three common configuration mistakes:
- Parsing JSON before computing HMAC. The signature is computed over the raw bytes. If you call
JSON.parse()first, the serialized output may differ from the original bytes and the signature will never match. - Logging the raw secret. Treat
WEBHOOK_SECRETlike a password. Never log it, never commit it, rotate it if exposed. - Using a short replay window. A 5-minute timestamp window blocks most replay attacks. A 24-hour window does not.
π§ Decision Guide: When Webhooks Are (and Aren't) the Answer
| Situation | Recommendation |
| Need instant event-driven updates with low idle load | Webhooks β the right default |
| Cannot expose a public HTTPS endpoint | Start with short-interval polling; migrate when infrastructure allows |
| High compliance / audit requirements | Persist raw payload + signature metadata before any processing |
| Multiple providers with different payload schemas | Build a normalized internal event model; translate at ingress |
| Provider doesn't support webhooks | Poll. Or check if they support long-polling / SSE instead |
π― What to Learn Next
- System Design Protocols: REST, RPC, and TCP/UDP
- System Design Core Concepts
- API Gateway vs. Load Balancer vs. Reverse Proxy
π οΈ Spring Boot + Svix: A Production-Safe Webhook Receiver
Spring Boot provides the @RestController and @RequestBody annotations needed to build a webhook endpoint in minutes, while Svix is an open-source webhook delivery platform that handles HMAC signing, retry scheduling, idempotency tracking, and an event portal β addressing every production concern from this post without building that infrastructure from scratch.
Together they solve the three production requirements in the TLDR: Spring Boot handles the receiver endpoint with signature verification, Svix handles the delivery-side guarantees (retries, exponential backoff, event portal for debugging), and Spring's @Async ensures the handler returns 200 before any business logic runs.
// pom.xml dependencies: spring-boot-starter-web, spring-boot-starter-data-jpa
import org.springframework.web.bind.annotation.*;
import org.springframework.http.*;
import org.springframework.scheduling.annotation.Async;
import org.springframework.stereotype.Service;
import org.springframework.beans.factory.annotation.Value;
import jakarta.persistence.*;
import java.nio.charset.StandardCharsets;
import javax.crypto.Mac;
import javax.crypto.spec.SecretKeySpec;
import java.util.*;
// ββ Domain: idempotency store βββββββββββββββββββββββββββββββββββββββββββββββββ
@Entity
public class WebhookEvent {
@Id String eventId;
String type;
String rawPayload;
boolean processed = false;
}
// ββ Signature verification (HMAC-SHA256) ββββββββββββββββββββββββββββββββββββββ
@Service
public class HmacVerifier {
@Value("${webhook.secret}") // loaded from application.properties
private String webhookSecret;
public boolean verify(byte[] rawBody, String signatureHeader) {
try {
Mac mac = Mac.getInstance("HmacSHA256");
mac.init(new SecretKeySpec(webhookSecret.getBytes(StandardCharsets.UTF_8), "HmacSHA256"));
String expected = HexFormat.of().formatHex(mac.doFinal(rawBody));
String received = signatureHeader.replace("sha256=", "");
// Constant-time comparison prevents timing attacks
return MessageDigest.isEqual(expected.getBytes(), received.getBytes());
} catch (Exception e) { return false; }
}
}
// ββ Webhook receiver controller βββββββββββββββββββββββββββββββββββββββββββββββ
@RestController
@RequestMapping("/webhooks")
public class WebhookController {
private final HmacVerifier verifier;
private final WebhookEventRepository eventRepo;
private final WebhookWorker worker;
public WebhookController(HmacVerifier v, WebhookEventRepository r, WebhookWorker w) {
this.verifier = v; this.eventRepo = r; this.worker = w;
}
@PostMapping(value = "/events", consumes = MediaType.APPLICATION_OCTET_STREAM_VALUE)
public ResponseEntity<String> receive(
@RequestBody byte[] rawBody,
@RequestHeader("X-Signature") String sig,
@RequestHeader("X-Event-Id") String eventId,
@RequestHeader("X-Event-Type") String eventType) {
// Step 1: Verify HMAC signature
if (!verifier.verify(rawBody, sig)) {
return ResponseEntity.status(401).body("invalid signature");
}
// Step 2: Idempotency check
if (eventRepo.existsById(eventId)) {
return ResponseEntity.ok("duplicate ignored");
}
// Step 3: Persist raw payload BEFORE returning 200
WebhookEvent event = new WebhookEvent();
event.eventId = eventId; event.type = eventType;
event.rawPayload = new String(rawBody, StandardCharsets.UTF_8);
eventRepo.save(event);
// Step 4: Hand off to async worker and return 200 immediately
worker.process(eventId);
return ResponseEntity.ok("accepted");
}
}
// ββ Async worker: business logic runs outside the HTTP thread βββββββββββββββββ
@Service
public class WebhookWorker {
private final WebhookEventRepository eventRepo;
public WebhookWorker(WebhookEventRepository r) { this.eventRepo = r; }
@Async // Spring thread pool β HTTP response already sent
public void process(String eventId) {
WebhookEvent event = eventRepo.findById(eventId).orElseThrow();
// Execute business logic here (update order, send notification, etc.)
System.out.println("Processing event: " + event.type + " / " + event.eventId);
event.processed = true;
eventRepo.save(event);
}
}
The @Async annotation ensures worker.process() runs on Spring's task executor thread pool β the HTTP thread returns 200 OK before any database or downstream service calls happen. Combined with eventRepo.existsById(eventId) for deduplication, this matches the exact pattern from the production-safe handler diagram.
For a full deep-dive on Spring Boot webhook receivers and Svix for managed delivery, a dedicated follow-up post is planned.
π Lessons from Webhook Systems in Production
Lesson 1: Idempotency is not optional β it is the foundation.
Every webhook system that goes to production eventually receives duplicate events. A provider network timeout, a retry on infrastructure restart, or an explicit replay will cause the same event.id to arrive twice. If your handler is not idempotent, you will double-charge customers, send duplicate notifications, or corrupt state. Build idempotency on day one.
Lesson 2: The return-200-fast pattern is more important than it looks. Teams that process inline get burned within weeks. A database slowdown causes a processing delay, the provider times out, retries fire, and you process the event twice despite having no explicit bug. The pattern β persist, enqueue, return 200 β protects against the entire class of retry-induced duplicates.
Lesson 3: Build an event replay pipeline before you need it. At some point your worker will have a bug, your queue will fill, or your downstream service will go down. You need to be able to re-process events from your persistent store. Design that replay pipeline into the system from the start, not as a fire-drill.
Lesson 4: Monitor signature failure rate as a security signal.
A sudden spike in signature_fail_rate means either your secret rotated without overlap, your endpoint is receiving spoofed requests, or there's a serialization mismatch. It is always worth investigating β it rarely resolves on its own.
π TLDR: Summary & Key Takeaways
- Webhooks invert the polling model β the provider pushes events to your endpoint the moment they occur.
- Every production webhook handler needs: HMAC signature verification, idempotency check, async processing.
- Return 200 as fast as possible β inline processing causes duplicate deliveries on retries.
- At-least-once delivery means duplicates are normal; make your handler idempotent by design.
- Monitor
signature_fail_rate,dedup_hit_rate, andqueue_depthas the three core health signals.
Quiet AI help
Article metadata

Written by
Abstract Algorithms
@abstractalgorithms
Reader feedback
Was this article useful?
Rate it if it helped, then continue with the next deep dive when you are ready.
Related deep dives
Continue reading



