Skip to content

Add TagMap read-through support (level-split phase 1 mechanism)#11789

Draft
dougqh wants to merge 2 commits into
masterfrom
dougqh/tagmap-read-through
Draft

Add TagMap read-through support (level-split phase 1 mechanism)#11789
dougqh wants to merge 2 commits into
masterfrom
dougqh/tagmap-read-through

Conversation

@dougqh

@dougqh dougqh commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Mechanism PR — read-through for OptimizedTagMap (single parent)

Adds optional read-through to OptimizedTagMap: a child map references a frozen parent (withParent) and reads through to it on a local miss, while local entries shadow the parent. This is the enabler for level-split phase 1 — a span will stop copying the shared trace-level tags (mergedTracerTags) down into every span; reads route through instead.

This PR is pure mechanism — inert until a consumer attaches a parent. No map has a parent yet (parent == null for every existing map), so there is no behavior change off the read-through path. The consumer wiring lands in a stacked follow-up.

Design

  • Single parent by design (anti-false-generalization): the consumer needs one parent (mergedTracerTags). Written so generalizing to multiple flattened parents is additive (the bulk walk is already bucket-aligned, the degenerate single-parent case of the multi-parent merge).
  • Removal via a lazy removedFromParent side-set, not inline tombstones. Inline-in-buckets kept breaking on the bare-Entry-vs-BucketGroup duality (re-type, per-group bitfield + single-Entry gap, two bitfields — all awkward). The side-set is shape-agnostic, keeps Entry/BucketGroup completely untouched, and the lazy null field doubles as the gate. Tombstones are rare (only when a parent-exposed key is removed).
  • Bulk reads via a bucket-aligned merge: exploits universal hashing (fixed 16-bucket table, no resize) so a parent entry's shadow check is scoped to the same-index local bucket, reusing the entry's cached hash — no re-hash, no global seen-set, no Set/BloomFilter. forEach (×3) stays alloc-free; IteratorBase does a two-phase local-then-parent walk so iterator/entrySet/keySet/values/stream all emit the deduped union.

What's covered

  • Read path: getEntry fall-through (the whole get*/containsKey family inherits it). isDefinitelyEmpty() + estimateSize() added as cheap conservative/upper-bound variants (mirroring Ledger); isEmpty()/size() stay exact (Map contract) and resolve the union.
  • Removal: tombstone a parent-exposed key; re-setting clears it; remove() returns the prior visible value (Map contract holds via read-through).
  • Bulk: forEach/iterators/collection views all emit the deduped, first-occurrence-wins union, skipping shadowed/tombstoned parent entries.
  • copy() preserves read-through (shares the frozen parent + copies tombstones) — was dropping both.
  • Behavior-identical-to-flat-map tests are the safety contract for the consumer flip.

TagMapReadThroughTest covers slices 1–4; the full TagMap* suite stays green (inert when parent == null).

Deferred / follow-ups

  • Inlining baseline for get/set across the parent != null branch — the perf-gate before marking this ready.
  • Consumer PR (stacked): the !needsIntercept-gated mergedTracerTags → withParent wiring. Includes the removeTag(VERSION) cleanup — config version moves out of the read-through bundle so the existing InternalTagsAdder conditional-add works unchanged and no per-span tombstone is minted (the apply-then-remove dance is vestigial; read-through keeps config-in-parent / manual-in-local). Plus an audit of any other per-span parent-key removals and of read-through maps as a merge/clone source (only copy() handled here).
  • Multi-parent + the cross-parent dedup clause — additive, with the dream.

tag: ai generated · pure mechanism, no behavior change until a consumer attaches a parent.

🤖 Generated with Claude Code

dougqh and others added 2 commits June 29, 2026 15:33
Adds optional read-through to OptimizedTagMap: a child map references a frozen
parent (withParent) and reads through to it on a local miss, while local entries
shadow the parent. The enabler for level-split phase 1 (a span no longer copies
the shared trace-level tags down per span).

Single-parent by design in phase 1 (anti-false-generalization); written so
generalizing to multiple flattened parents is additive. Inert when parent == null
(every existing map), so no behavior change off the read-through path.

- Read path: getEntry falls through to the parent on a local miss (get*/containsKey
  inherit it). isDefinitelyEmpty()/estimateSize() added as cheap conservative/upper-
  bound variants; isEmpty()/size() stay exact (Map contract) and resolve the union.
- Removal: a lazily-allocated removedFromParent side-set (also the gate) tombstones
  a parent-exposed key so it no longer reads through; re-setting clears it. Entry and
  BucketGroup stay untouched (the side-set is shape-agnostic vs the bare/group duality).
- Bulk reads (forEach x3, iterators, collection views): bucket-aligned self-vs-parent
  merge, first-occurrence-wins, exploiting universal hashing so the shadow check is
  scoped to the same-index local bucket (no re-hash, no global seen-set). Alloc-free.
- checkIntegrity asserts the local emptiness invariant (size==0), not union isEmpty().

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
copy() went through putAllIntoEmptyMap, which clones only the local buckets and
the local size — so a copy of a read-through map dropped the inherited parent
tags and the tombstones. Fix copy() to share the (frozen, immutable) parent and
copy the tombstone set, so the copy is observationally identical to the original
(same union) and independently mutable.

Adds behavior-identical-to-flat-map tests: copy equivalence, independent
mutability, tombstone preservation, equality with an equivalent flat map, and
immutableCopy of a read-through map. This is the safety contract for flipping
mergedTracerTags to a parent in the consumer change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@dougqh dougqh added comp: core Tracer core tag: ai generated Largely based on code generated by an AI or LLM tag: no release notes Changes to exclude from release notes type: enhancement Enhancements and improvements labels Jun 29, 2026
@dd-octo-sts

dd-octo-sts Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

🟢 Java Benchmark SLOs — All performance SLOs passed

Suite Status
Startup 🟢 pass

SLO thresholds are defined here based on automatically generated metrics. A warning is raised when results are within 5% of the threshold.

PR vs. master results
Scenario Candidate master Δ (95% CI of mean)
startup:insecure-bank:iast:Agent 14.01 s 14.01 s [-0.8%; +0.8%] (no difference)
startup:insecure-bank:tracing:Agent 12.90 s 13.08 s [-2.3%; -0.3%] (maybe better)
startup:petclinic:appsec:Agent 16.94 s 16.73 s [+0.3%; +2.2%] (maybe worse)
startup:petclinic:iast:Agent 16.86 s 16.98 s [-1.5%; +0.1%] (no difference)
startup:petclinic:profiling:Agent 16.76 s 16.88 s [-1.4%; -0.0%] (maybe better)
startup:petclinic:sca:Agent 16.86 s 16.69 s [-0.0%; +2.1%] (no difference)
startup:petclinic:tracing:Agent 16.08 s 16.12 s [-1.2%; +0.7%] (no difference)

Commit: 197e952f · CI Pipeline · Benchmarking Platform UI


Load and DaCapo benchmarks can be triggered manually in the GitLab pipeline. Results will appear in the Benchmarking Platform UI after completion.

@dougqh

dougqh commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

Inlining gate — off-path (parent == null) reads: parity confirmed

Ran -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining (no-fork, -f0) on UnsynchronizedMapBenchmark.get_tagMap / iterate_tagMap_forEach, master vs this branch, to verify the getEntry refactor + forEach parent loop don't regress hot-read inlining for the common parent == null case (every existing map).

Reads — parity, both fully inline hot:

getEntry decision
master monolith, 90 B inline (hot)
branch getEntry 51 B → getLocalEntry 24 B → findInBucket 45 B all inline (hot)

Same fully-inlined read — the branch inlines a 120 B chain of tiny methods where master inlines a 90 B monolith (+30 B for the parent != null check + framing, all far under FreqInlineSize 325). Cold-site "callee is too large" is identical on both (> the 35 B cold cutoff). So the split is inlining-neutral.

forEach: the read-through parent loop was extracted to per-variant forEachParent(...) (called only when parent != null), so off-path forEach is back to ~90 B (it had grown to 242 B inline) and compiles as its own loop unit exactly as before; the parent-loop code is out of line and dead when parent == null.

Writes: descoped — setTag → TagInterceptor → getAndSet is already a non-inlinable big method due to the interceptor cascade, so a write-side inlining measurement is confounded. The one gated removedFromParent != null field-check is noise against that; meaningful write-path inlining waits on interceptor retirement.

Not yet measured: the with-parent hot path (read-through active) — there's no benchmark with a parent attached yet. That belongs on the consumer PR's span-level -prof gc benchmark, which also demonstrates the per-span allocation win.

@dougqh

dougqh commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

Measured read-through win (quiet box, firmed)

TagMapReadThroughBenchmark (on the stacked consumer branch, where the wiring lands) models span-build tag assembly — copyDown (today: putAll the frozen trace-level bundle + span tags) vs readThrough (attach the bundle as a read-through parent, span tags local) — swept by trace-bundle size. 5 forks, @Threads(8), -prof gc, 25 measurements/point.

traceTagCount copyDown alloc readThrough alloc saved copyDown thrpt readThrough thrpt speedup
3 360 B/op 312 B/op −13% 170.8M ops/s 213.7M 1.25×
7 (realistic) 456 B/op 312 B/op −32% 133.0M ops/s 215.0M 1.62×
15 600 B/op 312 B/op −48% 101.1M ops/s 207.3M 2.05×

The headline is the shape, not a single number:

  • read-through is invariant to bundle size — 312 B/op and ~210M ops/s flat across 3/7/15 trace tags. A span no longer pays anything for the trace-level bundle; its cost is O(its own tags).
  • copyDown grows with the bundle (more BucketGroup clones + collisions to copy), so the win scales with mergedTracerTags size: −13%/1.25× at 3 tags → −48%/2.05× at 15.

Reliability: alloc deterministic (±0.001 over 25 samples); throughput tight (±2–5M).

Honest attribution / scope: the alloc delta is bucket structure (BucketGroup clones + fewer local collisions), not Entry objects — putAll-into-empty already shares the frozen entries. Bucket structures are ~3% of tracer alloc, so absolute macro impact at roomy heap is modest; the win matters most under GC pressure and on the app/request thread (the throughput column). The structural point is the invariance — read-through decouples per-span cost from trace-bundle size, which is the level-split thesis (trace tags paid once, not per span).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: core Tracer core tag: ai generated Largely based on code generated by an AI or LLM tag: no release notes Changes to exclude from release notes type: enhancement Enhancements and improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant