Skip to content

feat: native-spans wasm support (pipeline, capabilities, trace exporter)#151

Open
bengl wants to merge 21 commits into
mainfrom
bengl/native-spans-attempt-3-ish
Open

feat: native-spans wasm support (pipeline, capabilities, trace exporter)#151
bengl wants to merge 21 commits into
mainfrom
bengl/native-spans-attempt-3-ish

Conversation

@bengl

@bengl bengl commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds wasm-only native-spans support to libdatadog-nodejs: the Rust/wasm
bindings that let dd-trace-js run its span pipeline and trace export through
libdatadog instead of JS. Pairs with the companion dd-trace-js PR (DataDog/dd-trace-js#9139),
(bengl/native-spans-attempt-3), which consumes these bindings.

What's here (10 commits)

  • build: cargo workspace + wasm build tooling (scripts/build-wasm.js).
  • capabilities crate: the wasm capability bundle — HTTP transport (incl.
    unix-socket / Windows named-pipe), sleep, etc., routing I/O back into JS.
  • pipeline crate (WasmSpanState): change-buffer span pipeline (span_id
    protocol), string table, stats collector, and the span-data bindings:
    • meta_struct (setMetaStruct/getMetaStruct)
    • top-level span_events (addSpanEvent, typed attributes)
    • v0.5 output selection (setUseV05) — single exporter; in v0.5 mode
      meta_struct/span_events are dropped by the protocol, by design.
  • Toolchain + libdatadog dependency bumps to make the crates build for wasm.
  • Removed the standalone trace_exporter crate (JsTraceExporter, the
    earlier pre-encoded-bytes path): the pipeline crate's WasmSpanState
    builds its own internal TraceExporter and supersedes it; nothing consumes
    the standalone binding.

Testing

  • node --test suites: pipeline (incl. meta_struct/span_events round-trips,
    v0.5 vs v0.4 endpoint routing against a mock agent, bounds/overflow
    rejection), http_transport (incl. a real AF_UNIX socket), trace_exporter,
    process_discovery, crashtracker.
  • cargo check / clippy clean on the wasm target.
  • Each feature went through a multi-perspective review-until-green loop before
    landing.

Notes / caveats

  • wasm-only: the unsound native NAPI pipeline path was removed; all crates
    target wasm32-unknown-unknown.
  • Real end-to-end verification against a live agent is CI-gated (prebuilds
    are gitignored / not published here).
  • libdatadog is consumed as a git dependency; no libdatadog source changes.

Draft for review; not yet rebased for release.

Comment thread test/crashtracker/package-lock.json
Comment thread crates/process_discovery/src/lib.rs Outdated
@bengl bengl force-pushed the bengl/native-spans-attempt-3-ish branch from 0bf80b1 to 7b2d50b Compare June 29, 2026 18:16
@bengl bengl marked this pull request as ready for review June 29, 2026 18:23
@bengl bengl requested review from a team as code owners June 29, 2026 18:23

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7b2d50b74a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/pipeline/src/lib.rs Outdated
Comment thread package.json Outdated
Comment thread crates/capabilities/src/http.rs Outdated
bengl and others added 15 commits June 30, 2026 13:27
Establish the Rust workspace, pin the toolchain, and add the npm scripts
and shell tooling that build the wasm modules and the native (napi)
addons and run their test suites.

Co-authored-by: Jules Wiriath <jules.wiriath@datadoghq.com>
Co-authored-by: paullegranddc <paul.legranddescloizeaux@datadoghq.com>
Co-authored-by: Gyuheon Oh <102937919+gyuheon0h@users.noreply.github.com>
Implement the portable libdatadog capability traits for the wasm runtime:
an HTTP client backed by Node's http.request, a setTimeout-based sleep,
and a response-header observer hook. This bundle is the generic parameter
the data-pipeline and trace-exporter crates are instantiated with.

Co-authored-by: Jules Wiriath <jules.wiriath@datadoghq.com>
Co-authored-by: paullegranddc <paul.legranddescloizeaux@datadoghq.com>
Co-authored-by: Gyuheon Oh <102937919+gyuheon0h@users.noreply.github.com>
Wasm binding over libdatadog's span-id-addressed change buffer: span
creation and mutation via a binary op protocol, a deduplicated string
table, segment (trace-level) attributes, batched export through
prepareChunk/sendPreparedChunk, and optional client-side stats.

Co-authored-by: Jules Wiriath <jules.wiriath@datadoghq.com>
Co-authored-by: paullegranddc <paul.legranddescloizeaux@datadoghq.com>
Co-authored-by: Gyuheon Oh <102937919+gyuheon0h@users.noreply.github.com>
Expose libdatadog's TraceExporter to JS via wasm-bindgen, pinned to the
wasm capability bundle. The exporter is built lazily on first send, since
the blocking build path is unavailable on wasm. Includes an integration
test that drives it against an in-process mock agent.

Co-authored-by: Jules Wiriath <jules.wiriath@datadoghq.com>
Co-authored-by: paullegranddc <paul.legranddescloizeaux@datadoghq.com>
Co-authored-by: Gyuheon Oh <102937919+gyuheon0h@users.noreply.github.com>
Update library_config, process_discovery, and datadog-js-zstd for the
workspace's pinned Rust toolchain and libdatadog dependency versions.

Co-authored-by: Jules Wiriath <jules.wiriath@datadoghq.com>
Co-authored-by: paullegranddc <paul.legranddescloizeaux@datadoghq.com>
Co-authored-by: Gyuheon Oh <102937919+gyuheon0h@users.noreply.github.com>
libdatadog's Span already carries meta_struct (VecMap<Text, Bytes>) and
the exporter serializes it, but no JS binding existed, so structured
per-span data (AppSec, Code Origin, Dynamic Instrumentation) could not be
sent on the native path.

There is no change-buffer opcode for meta_struct, so setMetaStruct writes
the value directly onto the span via span_mut() after draining the change
queue. meta_struct depends on no other queued op, so bypassing the queue
ordering is safe \u2014 subsequent ops are applied on the next flush and never
touch meta_struct. getMetaStruct mirrors the existing per-span getters for
round-trip coverage.
The wasm HTTP transport only spoke http/https to a host:port, so a
`unix://` or `windows:` agent URL could not be reached: request() treated
the hex-encoded socket path (ddcommon's parse_uri stores it in the URI
authority) as a TCP host.

Detect the unix/windows scheme in request(), hex-decode the socket path
from the authority, and pass it to the JS transport, which now uses
Node's { socketPath } (covering Windows named pipes too). TCP requests
are unchanged; socket requests send a localhost Host header with no port.
Adds a unix-socket case to the transport tests.
Add `addSpanEvent` to append OpenTelemetry-style span events onto the
top-level v0.4 `span_events` field that libdatadog already serializes.
Like meta_struct there is no change-buffer opcode, so the event is
appended directly to the span after draining the queue (span_events do
not depend on any other queued op, so bypassing queue ordering is safe).

Attributes arrive as a flat little-endian buffer with per-value type
tags (String=0, Boolean=1, Integer=2, Double=3, Array=4) matching
libdatadog's AttributeArrayValue discriminants; every read is bounded
against the buffer so a malformed/truncated buffer errors instead of
panicking. A `getSpanEventsJson` helper serializes events via the same
serde impl used for the msgpack wire format, exercised by new
round-trip tests covering each scalar type, arrays, and bounds.
Add setUseV05() to WasmSpanState so the single trace exporter can emit the
v0.5 wire format (/v0.5/traces) instead of the default v0.4. The flag is read
once, at the lazy exporter build on first send, then fixed; callers must set
it before the first flush.

v0.5 uses a fixed 12-field schema with no slots for meta_struct, span_events,
or span_links, so libdatadog's v0.5 serializer silently drops them. This
mirrors dd-trace-js master's v0.5 encoder and is intentional \u2014 there is no
guard and no dual exporter. libdatadog does not downgrade V05 (unlike V1), so
the caller (dd-trace-js) is responsible for only enabling this after the agent
advertises /v0.5/traces via /info.
The pipeline crate's WasmSpanState builds its own internal TraceExporter and
owns serialization + send (sendPreparedChunk), superseding the standalone
trace_exporter binding (JsTraceExporter, the earlier pre-encoded-v0.4-bytes
path). Nothing consumes it: dd-trace-js (native-spans and master) loads only
the pipeline crate, and no other tracked code references it. Drop the crate,
its wasm test, and the build/test wiring.
The branch's build-wasm dropped the leading `yarn -s install-wasm-pack`
that main still runs, so a clean checkout without wasm-pack on PATH fails
with `wasm-pack: not found` before building any module. Restore it.
flushStats() took the StatsCollector out of its RefCell for the whole
async send, so a prepareChunk() during that await saw None and skipped
add_spans — permanently dropping those successfully-sent spans from
client-side stats. Split StatsCollector::flush into a synchronous
prepare_request (drain + encode under a brief borrow) and an async
send_request (no borrow). flushStats now builds the request, releases the
collector, then awaits the send, so concurrent add_spans is counted.
…ndle struct

Bump the libdatadog branch=main git deps from the older pinned commit
(4b79b7ed) to the current main HEAD (a38b6304). Current main requires the
capability generic to implement LogWriterCapability in addition to
HttpClientCapability and SleepCapability.

Replace the `WasmCapabilities = DefaultHttpClient` type alias with a proper
bundle struct that mirrors libdatadog's native `NativeCapabilities { http,
sleep }`:

- rename `DefaultHttpClient` to `WasmHttpClient` (HTTP only) and move the
  sleep impl into its own `WasmSleepCapability` (new `sleep` module),
  matching libdatadog's `NativeHttpClient` / `NativeSleepCapability` split.
- `WasmCapabilities` now delegates HTTP and sleep to those fields and
  implements `LogWriterCapability` as a no-op, since the wasm binding only
  runs in trace-export mode.

Also pass the new `override_max_entries_per_bucket` argument (None = default)
to `SpanConcentrator::new`, which gained a parameter on main.
libdatadog's TraceExporter can export traces over OTLP HTTP (JSON or
protobuf) instead of to the Datadog agent, mapping its internal traces to
OTLP directly. Expose that on WasmSpanState so dd-trace-js can honour
OTEL_TRACES_EXPORTER=otlp without resurrecting a JS-side OTLP exporter.

- setOtlpEndpoint(url): route export to an OTLP HTTP endpoint (e.g. an OTel
  Collector) instead of the agent.
- setOtlpProtocol('http/json'|'http/protobuf'): select the wire format;
  rejects unsupported values (grpc) at the parse boundary.
- setOtlpHeaders([k, v, ...]): extra headers (e.g. collector auth).

All three apply at lazy build time and only when an OTLP endpoint is set.
Address review-until-green feedback on the OTLP binding:
- Pin the default wire protocol: the endpoint-only OTLP test now asserts the
  request content-type is JSON (the http/json default) instead of accepting
  either json or protobuf.
- Cover multi-header export and the odd-length trailing-drop in one test (two
  header pairs plus a stray unpaired element; assert both pairs arrive).
- Document that setOtlpHeaders ignores a trailing unpaired element, replaces
  prior headers, and that setOtlpEndpoint takes precedence over setUseV05.
@bengl bengl force-pushed the bengl/native-spans-attempt-3-ish branch from 574428b to b1edad5 Compare June 30, 2026 17:28
@datadog-datadog-prod-us1

This comment has been minimized.

bengl added 3 commits June 30, 2026 14:21
The pipeline crate is wasm-only, but it was missing from the build-test-wasm
matrix, so its wasm module was never built in CI (nor shipped in releases),
and the native action-prebuildify test matrix crashed on
`require('..').maybeLoad('pipeline')` returning undefined.

- Add `pipeline` to the build-test-wasm matrix in build.yml and release.yml,
  and make the composite action's test step crate-aware: pipeline runs its
  top-level node:test suites with --test-force-exit, other crates keep using
  test-wasm.js. This builds + tests the pipeline wasm in CI and includes it in
  release prebuilds.
- test/pipeline.js now skips its suite when the wasm binding is unavailable
  (the native test matrix) instead of throwing on the destructure.
- eslint: add test-file overrides (the node:test runner, the snake_case
  http_transport.js name that must match the Rust wasm_bindgen module path,
  null inputs in the wasm test harness, and describe-scoped test helpers) and
  mechanical cleanups so `eslint .` passes across the new test files + shim.
- Cargo.lock: canonicalize the dual-source libdd-trace-protobuf entry left
  inconsistent by the earlier rebase merge (no dependency changes).
…an exit

The wasm exporter races a sleep(timeout) against each HTTP request as a timeout
guard (and reuses it for retry backoff). On a fast success the timer is
abandoned but, being reffed, kept the Node process alive for up to the request
timeout (5 minutes) after the last flush. unref the timer: an in-flight request
refs the event loop on its own, so an abandoned/standalone timeout timer must
not block process exit.

This also fixes the Node 18 CI test jobs: scripts/test.sh now probes for
--test-force-exit (Node >= 20.14/22) and runs without it on Node 18, which
rejects the flag as unknown. With the unref above, node:test exits cleanly on
Node 18 regardless.
Node 18 lacks --test-force-exit (the wasm exporter keeps the event loop alive
after a flush) and the wasm HTTP client leaves a mock-agent socket open that
node:test cannot drain there, so test/pipeline.js cannot exit cleanly on Node
18. Skip it on a Node without --test-force-exit. The pipeline wasm is fully
covered by the build-test-wasm job and the Node 20/22/24/26 runs here, so this
loses no real coverage (it is platform/Node-independent wasm).
@bengl bengl force-pushed the bengl/native-spans-attempt-3-ish branch from bb6694d to bf9873f Compare June 30, 2026 19:51
Comment thread crates/capabilities/Cargo.toml Outdated
Comment thread crates/capabilities/src/http.rs Outdated
Comment thread crates/pipeline/src/utils.rs Outdated
Comment thread crates/pipeline/src/lib.rs
@github-actions

github-actions Bot commented Jun 30, 2026

Copy link
Copy Markdown

Overall package size

Self size: 29.93 MB
Deduped: 29.93 MB
No deduping: 29.93 MB

Dependency sizes | name | version | self size | total size | |------|---------|-----------|------------|

🤖 This report was automatically generated by heaviest-objects-in-the-universe

bengl added 2 commits June 30, 2026 16:15
Address review feedback on the wasm binding:
- get_num now bounds-checks internally and returns Option, so a truncated or
  malformed buffer yields a clean error at the call site instead of a panic
  (the check no longer relies on every call site). All five callers propagate it.
- Latch a failed lazy build_async: building is one-shot and a build failure is
  fatal (bad config), so once it fails every send returns a distinguishable
  NativeExporterBuildError (a tagged JS Error) instead of a misleading
  'exporter builder already consumed' string, letting the host stop retrying.
- Avoid panicking unwrap()s: the response builder's headers_mut() (which could
  panic on an out-of-range agent status) is handled gracefully; the
  provably-safe unwraps (Vec::as_mut_ptr, Reflect::set on a fresh object)
  become documented expects.
…main

v37.0.0 includes everything the wasm pipeline needs \u2014 the libdd-capabilities
crate (with LogWriterCapability), the change-buffer feature, the
capability-bundle TraceExporter, and the OTLP builder methods \u2014 so pin the
released tag instead of tracking a moving main branch. This makes the build
reproducible and is a prerequisite for publishing the npm package.

SpanConcentrator::new gained a 5th argument (override_max_entries_per_bucket)
on a later main commit that is not in v37.0.0; drop it and use the 4-arg form
(we were passing its default, so behavior is unchanged).
ekump
ekump previously approved these changes Jul 1, 2026

@ekump ekump left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust code is LGTM

@szegedi szegedi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whew. This is a biggie. Generally, solid work as far as I can tell. I was a bit in two minds about getting a more substantial amount of code (compared to very thin wrappers) appearing in libdatadog-nodejs, but ultimately convinced myself that the intersection of everything libdatadog and Node.js indeed should live here.

Of the things I flagged, I definitely need the "strip = none, debug = true" to be addressed in release mode, as well as the @napi-rs/cli dependency, the rest are nits.

[dependencies]
anyhow = "1"
datadog-library-config = { git = "https://github.com/DataDog/libdatadog.git", tag = "v18.1.0" }
libdd-library-config = { git = "https://github.com/DataDog/libdatadog.git", rev = "353134770b312b7ccd2df6afabc253090b948e5f" }

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this rename necessary?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Libdatadog did this everywhere.

Comment thread Cargo.toml Outdated
Comment on lines +15 to +17
strip = true

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
# strip = "none"
debug = true

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're removing stripping and enabling debug in a release profile. I presume this is a testing leftover?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover, yep. Restored strip = true / dropped debug in fb34e63d.

Comment thread package.json Outdated
"access": "public"
},
"dependencies": {
"@napi-rs/cli": "^3.6.0"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a built-time CLI tool, isn't it? Shouldn't this go into "devDependencies" instead? (Is it needed at all? What uses it?)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in fb34e63d \u2014 unused (build is cargo + wasm-pack), and main has no runtime deps.

Comment thread crates/capabilities/src/http.rs Outdated
}
}

/// Parse response headers from a JS object `{ "header-name": "value", ... }`.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given how the function takes an array of strings as a parameter, I presume this comment is wrong.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, reworded in fb34e63d: it's Node's flat [name, value, ...] array, not an object.

Comment thread crates/pipeline/src/lib.rs Outdated
let mut span_ids = Vec::with_capacity(count as usize);
while count > 0 {
let span_id: u64 = get_num(chunk, &mut index)
.ok_or_else(|| JsValue::from_str("sendPreparedChunk: span id index out of bounds"))?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: function is named prepare_chunk, so maybe update the error message.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in fb34e63d \u2192 prepareChunk.

Comment thread crates/pipeline/src/stats.rs Outdated
/// `prepare_request` + `send_request`; callers that flush concurrently with
/// trace export should use those two directly so the collector isn't held
/// across the await (see `flushStats`).
pub async fn flush(&mut self, force: bool) -> Result<bool, String> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks unused to me – it's fine to keep it if you need it later.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in fb34e63d (dead since the prepare_request/send_request split).

Comment thread crates/pipeline/src/stats.rs Outdated
}

/// Update the agent URL (e.g. after reconfiguration).
pub fn set_agent_url(&mut self, url: String) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one looks unused too, again, no big deal if you intend to use it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in fb34e63d.

use libdd_capabilities::http::{HttpClientCapability, HttpError};
use libdd_capabilities::maybe_send::MaybeSend;

static WASM_MEMORY: LazyLock<JsValue> = LazyLock::new(wasm_bindgen::memory);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fun that this compiles (and since CI is green it must.) static in Rust implies Sync, and LazyLock: Sync would require JSValue to also be Sync which it… isn't? Anyhow, no action necessary as it compiles, but if I had more time to spend on this I'd love to understand why it works :-)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wasm-bindgen impls Send/Sync for JsValue on wasm32 (single-threaded), and this crate is wasm32-only \u2014 so it's sound. Added a comment noting that in fb34e63d.

szegedi
szegedi previously approved these changes Jul 1, 2026
…ethods, docs

Address szegedi's review on the wasm binding:
- Restore the release profile to strip=true and drop debug=true (a debugging
  leftover; matters for the published package size, and matches main). With
  stripping restored, pass --all-features to wasm-opt for the pipeline crate
  ([package.metadata.wasm-pack]): its post-MVP features (bulk-memory for the
  change-buffer copies, sign-extension, etc.) otherwise trip wasm-pack's
  bundled wasm-opt validation (the smaller wasm crates don't hit this).
- Remove the unused @napi-rs/cli runtime dependency \u2014 nothing invokes it and
  main builds the native prebuilds without it.
- Remove the unused StatsCollector::flush and set_agent_url methods (dead since
  flush was split into prepare_request/send_request); clears the dead_code warning.
- Fix the prepareChunk error message (was mislabeled sendPreparedChunk) and the
  parse_response_headers doc (it takes a flat [name, value, ...] array, not an
  object).
- Document why the WASM_MEMORY LazyLock<JsValue> is Sync (wasm32-only).
@bengl bengl force-pushed the bengl/native-spans-attempt-3-ish branch from fb34e63 to 3406c6e Compare July 1, 2026 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants