Add TagMap read-through support (level-split phase 1 mechanism) by dougqh · Pull Request #11789 · DataDog/dd-trace-java

dougqh · 2026-06-29T19:48:09Z

Mechanism PR — read-through for `OptimizedTagMap` (single parent)

Adds optional read-through to OptimizedTagMap: a child map references a frozen parent (withParent) and reads through to it on a local miss, while local entries shadow the parent. This is the enabler for level-split phase 1 — a span will stop copying the shared trace-level tags (mergedTracerTags) down into every span; reads route through instead.

This PR is pure mechanism — inert until a consumer attaches a parent. No map has a parent yet (parent == null for every existing map), so there is no behavior change off the read-through path. The consumer wiring lands in a stacked follow-up.

Design

Single parent by design (anti-false-generalization): the consumer needs one parent (mergedTracerTags). Written so generalizing to multiple flattened parents is additive (the bulk walk is already bucket-aligned, the degenerate single-parent case of the multi-parent merge).
Removal via a lazy removedFromParent side-set, not inline tombstones. Inline-in-buckets kept breaking on the bare-Entry-vs-BucketGroup duality (re-type, per-group bitfield + single-Entry gap, two bitfields — all awkward). The side-set is shape-agnostic, keeps Entry/BucketGroup completely untouched, and the lazy null field doubles as the gate. Tombstones are rare (only when a parent-exposed key is removed).
Bulk reads via a bucket-aligned merge: exploits universal hashing (fixed 16-bucket table, no resize) so a parent entry's shadow check is scoped to the same-index local bucket, reusing the entry's cached hash — no re-hash, no global seen-set, no Set/BloomFilter. forEach (×3) stays alloc-free; IteratorBase does a two-phase local-then-parent walk so iterator/entrySet/keySet/values/stream all emit the deduped union.

What's covered

Read path: getEntry fall-through (the whole get*/containsKey family inherits it). isDefinitelyEmpty() + estimateSize() added as cheap conservative/upper-bound variants (mirroring Ledger); isEmpty()/size() stay exact (Map contract) and resolve the union.
Removal: tombstone a parent-exposed key; re-setting clears it; remove() returns the prior visible value (Map contract holds via read-through).
Bulk: forEach/iterators/collection views all emit the deduped, first-occurrence-wins union, skipping shadowed/tombstoned parent entries.
copy() preserves read-through (shares the frozen parent + copies tombstones) — was dropping both.
Behavior-identical-to-flat-map tests are the safety contract for the consumer flip.

TagMapReadThroughTest covers slices 1–4; the full TagMap* suite stays green (inert when parent == null).

Deferred / follow-ups

Inlining baseline for get/set across the parent != null branch — the perf-gate before marking this ready.
Consumer PR (stacked): the !needsIntercept-gated mergedTracerTags → withParent wiring. Includes the removeTag(VERSION) cleanup — config version moves out of the read-through bundle so the existing InternalTagsAdder conditional-add works unchanged and no per-span tombstone is minted (the apply-then-remove dance is vestigial; read-through keeps config-in-parent / manual-in-local). Plus an audit of any other per-span parent-key removals and of read-through maps as a merge/clone source (only copy() handled here).
Multi-parent + the cross-parent dedup clause — additive, with the dream.

tag: ai generated · pure mechanism, no behavior change until a consumer attaches a parent.

🤖 Generated with Claude Code

Adds optional read-through to OptimizedTagMap: a child map references a frozen parent (withParent) and reads through to it on a local miss, while local entries shadow the parent. The enabler for level-split phase 1 (a span no longer copies the shared trace-level tags down per span). Single-parent by design in phase 1 (anti-false-generalization); written so generalizing to multiple flattened parents is additive. Inert when parent == null (every existing map), so no behavior change off the read-through path. - Read path: getEntry falls through to the parent on a local miss (get*/containsKey inherit it). isDefinitelyEmpty()/estimateSize() added as cheap conservative/upper- bound variants; isEmpty()/size() stay exact (Map contract) and resolve the union. - Removal: a lazily-allocated removedFromParent side-set (also the gate) tombstones a parent-exposed key so it no longer reads through; re-setting clears it. Entry and BucketGroup stay untouched (the side-set is shape-agnostic vs the bare/group duality). - Bulk reads (forEach x3, iterators, collection views): bucket-aligned self-vs-parent merge, first-occurrence-wins, exploiting universal hashing so the shadow check is scoped to the same-index local bucket (no re-hash, no global seen-set). Alloc-free. - checkIntegrity asserts the local emptiness invariant (size==0), not union isEmpty(). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

copy() went through putAllIntoEmptyMap, which clones only the local buckets and the local size — so a copy of a read-through map dropped the inherited parent tags and the tombstones. Fix copy() to share the (frozen, immutable) parent and copy the tombstone set, so the copy is observationally identical to the original (same union) and independently mutable. Adds behavior-identical-to-flat-map tests: copy equivalence, independent mutability, tombstone preservation, equality with an equivalent flat map, and immutableCopy of a read-through map. This is the safety contract for flipping mergedTracerTags to a parent in the consumer change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

dd-octo-sts · 2026-06-29T20:08:33Z

🟢 Java Benchmark SLOs — All performance SLOs passed

Suite	Status
Startup	🟢 pass

SLO thresholds are defined here based on automatically generated metrics. A warning is raised when results are within 5% of the threshold.

PR vs. master results

Scenario	Candidate	master	Δ (95% CI of mean)
startup:insecure-bank:iast:Agent	14.01 s	14.01 s	[-0.8%; +0.8%] (no difference)
startup:insecure-bank:tracing:Agent	12.90 s	13.08 s	[-2.3%; -0.3%] (maybe better)
startup:petclinic:appsec:Agent	16.94 s	16.73 s	[+0.3%; +2.2%] (maybe worse)
startup:petclinic:iast:Agent	16.86 s	16.98 s	[-1.5%; +0.1%] (no difference)
startup:petclinic:profiling:Agent	16.76 s	16.88 s	[-1.4%; -0.0%] (maybe better)
startup:petclinic:sca:Agent	16.86 s	16.69 s	[-0.0%; +2.1%] (no difference)
startup:petclinic:tracing:Agent	16.08 s	16.12 s	[-1.2%; +0.7%] (no difference)

Commit: 197e952f · CI Pipeline · Benchmarking Platform UI

Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion.

dougqh · 2026-06-29T20:26:15Z

Inlining gate — off-path (`parent == null`) reads: parity confirmed

Ran -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining (no-fork, -f0) on UnsynchronizedMapBenchmark.get_tagMap / iterate_tagMap_forEach, master vs this branch, to verify the getEntry refactor + forEach parent loop don't regress hot-read inlining for the common parent == null case (every existing map).

Reads — parity, both fully inline hot:

	`getEntry`	decision
master	monolith, 90 B	`inline (hot)`
branch	`getEntry` 51 B → `getLocalEntry` 24 B → `findInBucket` 45 B	all `inline (hot)`

Same fully-inlined read — the branch inlines a 120 B chain of tiny methods where master inlines a 90 B monolith (+30 B for the parent != null check + framing, all far under FreqInlineSize 325). Cold-site "callee is too large" is identical on both (> the 35 B cold cutoff). So the split is inlining-neutral.

forEach: the read-through parent loop was extracted to per-variant forEachParent(...) (called only when parent != null), so off-path forEach is back to ~90 B (it had grown to 242 B inline) and compiles as its own loop unit exactly as before; the parent-loop code is out of line and dead when parent == null.

Writes: descoped — setTag → TagInterceptor → getAndSet is already a non-inlinable big method due to the interceptor cascade, so a write-side inlining measurement is confounded. The one gated removedFromParent != null field-check is noise against that; meaningful write-path inlining waits on interceptor retirement.

Not yet measured: the with-parent hot path (read-through active) — there's no benchmark with a parent attached yet. That belongs on the consumer PR's span-level -prof gc benchmark, which also demonstrates the per-span allocation win.

dougqh · 2026-06-30T00:41:09Z

Measured read-through win (quiet box, firmed)

TagMapReadThroughBenchmark (on the stacked consumer branch, where the wiring lands) models span-build tag assembly — copyDown (today: putAll the frozen trace-level bundle + span tags) vs readThrough (attach the bundle as a read-through parent, span tags local) — swept by trace-bundle size. 5 forks, @Threads(8), -prof gc, 25 measurements/point.

`traceTagCount`	copyDown alloc	readThrough alloc	saved	copyDown thrpt	readThrough thrpt	speedup
3	360 B/op	312 B/op	−13%	170.8M ops/s	213.7M	1.25×
7 (realistic)	456 B/op	312 B/op	−32%	133.0M ops/s	215.0M	1.62×
15	600 B/op	312 B/op	−48%	101.1M ops/s	207.3M	2.05×

The headline is the shape, not a single number:

read-through is invariant to bundle size — 312 B/op and ~210M ops/s flat across 3/7/15 trace tags. A span no longer pays anything for the trace-level bundle; its cost is O(its own tags).
copyDown grows with the bundle (more BucketGroup clones + collisions to copy), so the win scales with mergedTracerTags size: −13%/1.25× at 3 tags → −48%/2.05× at 15.

Reliability: alloc deterministic (±0.001 over 25 samples); throughput tight (±2–5M).

Honest attribution / scope: the alloc delta is bucket structure (BucketGroup clones + fewer local collisions), not Entry objects — putAll-into-empty already shares the frozen entries. Bucket structures are ~3% of tracer alloc, so absolute macro impact at roomy heap is modest; the win matters most under GC pressure and on the app/request thread (the throughput column). The structural point is the invariance — read-through decouples per-span cost from trace-bundle size, which is the level-split thesis (trace tags paid once, not per span).

dougqh and others added 2 commits June 29, 2026 15:33

dougqh added comp: core Tracer core tag: ai generated Largely based on code generated by an AI or LLM tag: no release notes Changes to exclude from release notes type: enhancement Enhancements and improvements labels Jun 29, 2026

dougqh mentioned this pull request Jun 30, 2026

Store known span tags densely in TagMap by tag-id #11814

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add TagMap read-through support (level-split phase 1 mechanism)#11789

Add TagMap read-through support (level-split phase 1 mechanism)#11789
dougqh wants to merge 2 commits into
masterfrom
dougqh/tagmap-read-through

dougqh commented Jun 29, 2026

Uh oh!

dd-octo-sts Bot commented Jun 29, 2026

Uh oh!

dougqh commented Jun 29, 2026

Uh oh!

dougqh commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

dougqh commented Jun 29, 2026

Mechanism PR — read-through for OptimizedTagMap (single parent)

Design

What's covered

Deferred / follow-ups

Uh oh!

dd-octo-sts Bot commented Jun 29, 2026

🟢 Java Benchmark SLOs — All performance SLOs passed

Uh oh!

dougqh commented Jun 29, 2026

Inlining gate — off-path (parent == null) reads: parity confirmed

Uh oh!

dougqh commented Jun 30, 2026

Measured read-through win (quiet box, firmed)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Mechanism PR — read-through for `OptimizedTagMap` (single parent)

Inlining gate — off-path (`parent == null`) reads: parity confirmed