Skip to main content
Back to Blog
automationobservabilitydistributed-tracingjaegerclickhousesreplatform-engineeringstorage-optimization

Why Jaeger's Bet on ClickHouse Is a Quiet Reset for Tracing Infrastructure

Jaeger's ClickHouse backend delivers 8.6× span compression, dramatically cutting storage costs and extending distributed tracing retention from days to months.

Zyfolks Team ·

Most tracing backends quietly bleed your storage budget. A microservice fleet emitting billions of spans a week ends up with a Cassandra or Elasticsearch bill that has nothing to do with the number of incidents it ever helped you debug. So when the Jaeger team shipped a ClickHouse backend in v2.18.0 and reported an 8.6× compression ratio on 10 million spans, that wasn’t a footnote — it was a signal that the cost structure of observability is about to shift under everyone’s feet.

Jaeger, the CNCF-graduated distributed tracing platform, has had Cassandra and Elasticsearch as its primary storage backends for years. Per the project maintainer’s writeup on The New Stack, ClickHouse support has been one of the most consistently requested features in the community, and it landed as an alpha backend in v2.18.0. The point isn’t that another storage option exists. It’s that the architecture of a columnar OLAP database happens to match the shape of trace data almost perfectly — and the benchmark numbers back that up.

Why Columnar Storage Eats Trace Data for Breakfast

Trace data is brutally repetitive. The same service names (auth-service, payment-gateway), the same operation names, the same status codes, the same tag keys — all appearing hundreds of thousands of times a day. In a row-oriented store, that redundancy is dead weight on disk. ClickHouse groups identical values column-by-column, which makes them trivial to compress. According to the Jaeger team’s single-node benchmark of 10 million spans across 1 million traces, the spans table hit an 8.6× compression ratio, collapsing nearly 6 GiB of span data down to roughly 722 MiB on disk.

That matters because retention is where tracing programs go to die. Teams routinely set seven-day retention not because seven days is the right answer, but because thirty days would bankrupt them. If you’re running a platform team that’s been rationing trace retention to control Elasticsearch costs, an 8.6× reduction on the dominant table is the difference between keeping a week and keeping a quarter. Imagine an SRE chasing a regression that only surfaces during month-end batch jobs — the storage backend either has that data or it doesn’t, and ClickHouse’s compression profile makes “it does” affordable. Within a year, ClickHouse will be the default recommendation for new Jaeger deployments, and Elasticsearch will be the legacy choice nobody picks for greenfield clusters.

The Primary Key Decision That Defines the Whole Backend

The writeup’s real contribution isn’t the compression — it’s the schema trade-off the maintainers walked through publicly. In ClickHouse, the primary key isn’t a uniqueness constraint; it defines the on-disk sort order and powers a sparse index with one entry per 8,192-row granule. Pick it wrong and every query pays for it forever.

The Jaeger team had two candidates: sort by trace_id (great for fetching a single trace by ID, terrible for search) or sort by (service_name, name, start_time) (great for search, mediocre for trace retrieval). An earlier benchmark with the trace_id ordering clocked trace retrieval at about 27 ms but search at about 880 ms. Re-sorting by (service_name, name, start_time) pushed retrieval to roughly 100 ms while bringing multi-filter search down to about 140 ms. They chose the search-optimized layout and used a bloom_filter skip index on trace_id plus a trace_id_timestamps materialized view to claw back most of the retrieval cost.

Beyond Jaeger, it’s a worked example of OLAP schema design for any append-heavy, query-asymmetric workload. If you’re building an internal analytics product or a multi-tenant SaaS platform that needs both point lookups and aggregations on the same dataset, the Jaeger ADR is one of the cleaner public artifacts on how to negotiate that trade-off without hand-waving. More open-source projects should publish Architectural Decision Records this concrete. The benchmark deltas — 27 ms vs. 880 ms vs. 100/140 ms — turn what’s usually a religious debate into a measurable engineering decision.

Typed Attributes and the Five-Level Problem

Jaeger v2 moved to the OpenTelemetry data model, which means attributes are no longer always strings. They can be Bool, Int64, Float64, String, or complex types like Bytes, Slice, and Map — and they can live at five different levels: resource, scope, span, event, or link. That sounds like a schema designer’s nightmare, and it is.

The Jaeger team’s solution uses ClickHouse’s Nested column type, one per primitive type, repeated at each of the five levels. To avoid scanning every type-level combination on every query, they maintain a dedicated attribute_metadata table populated by materialized views off the spans table. When you search for HTTP.status_code=200, the reader looks up which columns actually contain that key and only scans those. The maintainer notes this honestly: attribute-only searches can’t fully use the primary index, so users should always combine attribute filters with service, operation, or time to limit the scan.

For teams running AI agents or LLM-backed services that emit high-cardinality attributes — model name, token counts, tool call IDs — this design directly affects how you instrument. If you dump everything into span attributes and expect free-form search to be fast, you’ll be disappointed. If you treat service.name and operation.name as the first-class filter dimensions and use attributes for refinement, the schema rewards you. Expect OpenTelemetry SDK authors to start publishing instrumentation guidance tuned for columnar backends, because the cost of getting it wrong is now measurable rather than theoretical.

Built-In SPM Means One Fewer Pipeline to Run

The underrated feature in v2.18 is native ClickHouse Service Performance Monitoring (SPM). Because columnar aggregations are cheap, Jaeger can now compute service-level latency, call rates, and error rates directly from stored spans — no separate metrics pipeline required.

This quietly removes a Prometheus or span-metrics-connector deployment from your stack. If you’re a platform team currently running OpenTelemetry Collector with a span-to-metrics processor feeding a separate Prometheus cluster just to power your service dashboards, Jaeger v2.18 collapses that into a single backend. Fewer moving parts, fewer aggregation discrepancies, fewer 3 AM pages about why the dashboard says one latency and the traces say another. “Observability convergence” gets thrown around as a marketing term; this is what it actually looks like — one storage engine serving two query patterns competently, instead of two specialized stores that need to be kept in sync.

What the 50k Spans/Sec Number Actually Tells You

The benchmark sustained more than 50k spans/sec on a single-node deployment ingesting 10 million spans. That’s a respectable number, but the more important context is what’s missing: this is single-node, alpha-quality, and the methodology lives in the project’s benchmarking report on GitHub. The maintainer is upfront that these results should be read in context.

For a mid-sized engineering org — say, fifty services emitting a few thousand spans per second at peak — a single ClickHouse node could plausibly handle the workload with headroom for retention. For the hyperscale crowd already running multi-node Cassandra clusters, the real question is how the schema and primary-key choices hold up under sharded, replicated ClickHouse with much larger granule counts. That benchmark hasn’t been published yet, and it’s the one that will decide whether ClickHouse becomes the default or stays a popular alternative. Expect it to land within a few releases.

FAQ

Q: What is Jaeger and why does the storage backend matter? A: Jaeger is a CNCF-graduated distributed tracing platform that follows requests across microservice boundaries to expose latency and root-cause issues. The storage backend determines how much trace data you can afford to keep, how fast searches return, and how complex your operational footprint is — which in turn determines whether engineers actually use traces during incidents.

Q: Should existing Jaeger users migrate from Elasticsearch or Cassandra to ClickHouse today? A: ClickHouse support is alpha in v2.18.0, so production migration is premature for risk-averse teams. New deployments and staging environments are reasonable places to evaluate it now, especially if storage cost or search latency is a current pain point. Expect a stable promotion within the next few minor releases based on the project’s typical cadence.

Q: Does the 8.6× compression number apply to every workload? A: No. The 8.6× figure comes from a specific benchmark of 10 million spans across 1 million traces on a single node, with the schema and data distribution described in the Jaeger benchmarking report. Workloads with extremely high-cardinality attributes or heavy use of complex types may compress less aggressively, while highly repetitive service-to-service traffic may compress more.

Key Takeaways

  • Teams stuck rationing trace retention on Elasticsearch should pilot Jaeger v2.18 with ClickHouse in staging now, so they have real numbers before the next budget cycle forces a decision.
  • Schema design — specifically the primary-key sort order — is the single highest-leverage choice in any OLAP-backed observability stack; copy Jaeger’s ADR pattern when designing your own.
  • Instrumentation guidance needs to evolve: pair attribute filters with service, operation, and time predicates or watch query costs balloon on columnar backends.
  • The convergence of traces and service metrics into a single backend will reduce the number of specialized pipelines platform teams need to operate, and the span-metrics-connector pattern is the most likely casualty.
  • Watch for a multi-node ClickHouse benchmark from the Jaeger team — that result, more than the alpha announcement, will determine whether ClickHouse becomes the default recommendation for new tracing deployments.

Have a project in mind?

Tell us what you're building — we reply within 24 hours.