All Posts

Elasticsearch vs Time-Series DB: Key Differences Explained

Should you store logs in Elasticsearch or InfluxDB? We compare Search Engines vs. Time-Series Dat...

Abstract AlgorithmsAbstract Algorithms
··5 min read
Share
Share on X / Twitter
Share on LinkedIn
Copy link

TLDR: Elasticsearch is built for search — full-text log queries, fuzzy matching, and relevance ranking via an inverted index. InfluxDB and Prometheus are built for metrics — numeric time series with aggressive compression. Picking the wrong one wastes 10× the storage or makes queries 100× slower.


📖 Logs vs Metrics: Two Different Storage Problems

A log is a sentence: 2024-01-15 ERROR: failed to connect to database host=db1.

A metric is a number at a timestamp: cpu.usage{host=web1} = 87.3 @ 1705312800.

These look similar (both are time-ordered data) but demand fundamentally different storage strategies:

PropertyLog dataMetric data
StructureSemi-structured textStrictly typed numbers
Query patternFull-text search, grep, aggregationRange queries, rate calculations, aggregation
CardinalityUnbounded keysBounded label/tag sets
Update frequencyWrite-once streamsRegular intervals (every 15s)
RetentionDays to months (expensive)Months to years (cheap with downsampling)

Elasticsearch is built on Apache Lucene. Its core data structure is the inverted index: a map from every term (word) to the list of documents that contain it.

"failed" → [doc_3, doc_7, doc_12]
"database" → [doc_3, doc_9]
"connection" → [doc_7, doc_12, doc_20]

This lets Elasticsearch answer "find all logs containing 'database' AND 'connection'" in milliseconds, even across billions of log lines.

Strengths:

  • Full-text search with stemming, fuzzy matching, synonyms
  • Relevance ranking (BM25)
  • Aggregation pipelines (histograms, top-N, date histograms)
  • Schema flexibility (dynamic mappings)

Weaknesses:

  • High storage overhead — inverted index per field duplicates data
  • Poor at range math on numeric series (no delta encoding)
  • High cardinality is expensive: each unique label value adds index memory

⚙️ Time-Series DBs: Delta Encoding and Columnar Compression

TSDBs (InfluxDB, Prometheus, TimescaleDB, VictoriaMetrics) are optimized for the fact that metric values change slowly.

Delta encoding example:

Raw:     100, 101, 102, 103
Encoded: 100, +1, +1, +1

Storing deltas instead of absolute values reduces the integer size dramatically. A 64-bit value becomes a 1-bit delta. With additional compression (Gorilla encoding, Snappy), modern TSDBs achieve 1–2 bytes per data point versus Elasticsearch's 50–100 bytes per log document.

flowchart LR
    Sensor[Sensor 87.3\n87.4\n87.5] --> Delta[Delta Encoding\n87.3 +0.1 +0.1]
    Delta --> Compress[Gorilla XOR\nCompression]
    Compress --> TSDB[(TSDB Block\n1-2 bytes/point)]

Strengths:

  • Efficient storage (10–50× smaller than Elastic for pure metrics)
  • Fast range queries and time aggregations (SUM, AVG, RATE)
  • Built-in downsampling and retention policies
  • Cardinality-efficient label model (Prometheus label sets)

Weaknesses:

  • Poor at full-text search (no inverted index)
  • Limited schema flexibility (labels must be pre-planned for cardinality control)

🌍 Which One to Use and When

SituationUse
"Find all error logs containing 'timeout'"Elasticsearch
"What was the p99 latency over the last 6 hours?"Prometheus / InfluxDB
"Show me all logs where user_id=12345 performed a payment"Elasticsearch
"Alert when CPU > 90% for 5 minutes"Prometheus
"Audit trail: who changed what and when"Elasticsearch
"How many requests per second to /api/v1/order over 30 days?"TimescaleDB / InfluxDB

In practice: Production observability stacks often use both. The ELK stack (Elasticsearch + Logstash + Kibana) handles logs; Prometheus + Grafana handles metrics.


⚖️ Cardinality: The TSDB Killer

The biggest operational risk in TSDBs is high-cardinality labels.

Prometheus memory usage scales with the number of unique time series — roughly labels × label combinations. A common trap: using user_id or session_id as a Prometheus label. One million users = one million separate time series = OOM crash.

Rule: TSDBs track populations (per-service, per-host, per-endpoint). Elasticsearch searches individuals (this log, this request, this user).


📌 Key Takeaways

  • Elasticsearch is for text search; TSDBs are for numeric time series.
  • Elasticsearch uses an inverted index — fast for full-text, expensive for pure numbers.
  • TSDBs use delta encoding + compression — 10–50× smaller for regular numeric streams.
  • Use both in production: ELK for logs, Prometheus/Grafana for metrics.
  • Watch out for high-cardinality labels in Prometheus — they cause OOM crashes.

🧩 Test Your Understanding

  1. Why does delta encoding work so well for CPU metric data?
  2. A team wants to search server logs for the phrase "failed payment." Elasticsearch or InfluxDB?
  3. Why is using user_id as a Prometheus label dangerous?
  4. What is the Gorilla encoding algorithm optimizing for?

Abstract Algorithms

Written by

Abstract Algorithms

@abstractalgorithms