Skip to content

Tag consumer spans with the pathway hash on every data streams checkpoint#11808

Open
ericfirth wants to merge 4 commits into
masterfrom
eric.firth/dsm-consumer-pathway-hash
Open

Tag consumer spans with the pathway hash on every data streams checkpoint#11808
ericfirth wants to merge 4 commits into
masterfrom
eric.firth/dsm-consumer-pathway-hash

Conversation

@ericfirth

Copy link
Copy Markdown
Contributor

What

Set the pathway.hash span tag inside DefaultDataStreamsMonitoring.setCheckpoint(span, context) so it is applied on every checkpoint that has a span — consume side included — not only on produce/inject.

Why

Today pathway.hash is written in exactly one place: DataStreamsPropagator.inject(...), which only runs on the produce/inject path. Consumers that checkpoint without injecting downstream context — e.g. RabbitMQ's basicGet/basicConsume, which call getDataStreamsMonitoring().setCheckpoint(span, ...) — compute and emit the pathway hash but never tag their span with it. So consumer spans are missing pathway.hash and can't be correlated back to the pathway.

Some consumers (e.g. SQS) happen to get the tag only because their integration also calls dsmPropagator.inject(...) during receive — an integration-specific side effect, not a guarantee.

The JS (processor.jsrecordCheckpoint) and Python (processor.pyset_checkpoint) tracers already set the tag in their central checkpoint method, so it lands on both produce and consume. This change brings Java to parity by doing the same in the core setCheckpoint.

Change

After pathwayContext.setCheckpoint(...), tag the span with pathway.hash when the hash is non-zero. The produce path is unchanged (the propagator still tags on inject; the value is identical, so this is idempotent for producers that also flow through setCheckpoint).

Tests

Added three unit tests in DefaultDataStreamsMonitoringTest:

  • tags the span with the unsigned pathway hash when the hash is non-zero,
  • does not tag when the hash is zero,
  • does not tag when there is no pathway context.

./gradlew :dd-trace-core:test --tests "datadog.trace.core.datastreams.DefaultDataStreamsMonitoringTest" → 36 tests, 0 failures.

Context

Found while working on RabbitMQ DSM (the consumer span had no pathway.hash while the producer did). Independent of, and complementary to, the RabbitMQ default-exchange topic fix (#11805).

…oint

The pathway.hash span tag was only set on the produce/inject path
(DataStreamsPropagator.inject). Consumers that checkpoint without injecting
(e.g. RabbitMQ) had no pathway.hash, unlike the JS and Python tracers which tag
the span on every checkpoint. Set it centrally in
DefaultDataStreamsMonitoring.setCheckpoint so all consume-side integrations get
it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ericfirth ericfirth added type: bug Bug report and fix comp: data streams Data Streams Monitoring labels Jun 30, 2026
Per review on #11808; the rationale lives in the PR description.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ericfirth ericfirth requested a review from Copilot June 30, 2026 15:53
@ericfirth ericfirth marked this pull request as ready for review June 30, 2026 15:53
@ericfirth ericfirth requested a review from a team as a code owner June 30, 2026 15:53
@dd-octo-sts dd-octo-sts Bot added the tag: ai generated Largely based on code generated by an AI or LLM label Jun 30, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Java Data Streams Monitoring (DSM) core checkpointing path so spans are tagged with pathway.hash on every DSM checkpoint that has an associated span (including consumer-side checkpoints), aligning behavior with other tracers and improving pathway/span correlation.

Changes:

  • Tag spans with pathway.hash inside DefaultDataStreamsMonitoring.setCheckpoint(span, context) when a non-zero pathway hash is available.
  • Add unit tests to validate tagging behavior (non-zero hash, zero hash, and missing pathway context).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
dd-trace-core/src/main/java/datadog/trace/core/datastreams/DefaultDataStreamsMonitoring.java Tags spans with pathway.hash after PathwayContext.setCheckpoint(...) when hash is non-zero.
dd-trace-core/src/test/java/datadog/trace/core/datastreams/DefaultDataStreamsMonitoringTest.java Adds unit tests covering new span-tagging behavior for setCheckpoint.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ericfirth and others added 2 commits June 30, 2026 12:08
- cache pathwayContext.getHash() in a local instead of calling it twice
- use a negative hash in the test so it actually exercises Long.toUnsignedString

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…bled

Now that DefaultDataStreamsMonitoring.setCheckpoint tags the span on every
checkpoint, the inferred-proxy (API Gateway) HTTP server span gets a
pathway.hash like any other DSM-enabled server span. The shared HttpServerTest
already asserts this conditionally; SpringBootBasedTest's hand-rolled
inferred-proxy assertions were missing it. Mirror the same guard.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ericfirth ericfirth requested a review from a team as a code owner June 30, 2026 17:28
@ericfirth ericfirth requested review from ygree and removed request for a team June 30, 2026 17:28
pathwayContext.setCheckpoint(context, this::add);
long pathwayHash = pathwayContext.getHash();
if (pathwayHash != 0) {
span.setTag(PATHWAY_HASH, Long.toUnsignedString(pathwayHash));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the PR description does this mean there are integrations where we are doing this manually on a consume already? In which case, should we remove them so it only happens once here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp: data streams Data Streams Monitoring tag: ai generated Largely based on code generated by an AI or LLM type: bug Bug report and fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants