feat: native-spans wasm support (pipeline, capabilities, trace exporter)#151
feat: native-spans wasm support (pipeline, capabilities, trace exporter)#151bengl wants to merge 21 commits into
Conversation
0bf80b1 to
7b2d50b
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7b2d50b74a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Establish the Rust workspace, pin the toolchain, and add the npm scripts and shell tooling that build the wasm modules and the native (napi) addons and run their test suites. Co-authored-by: Jules Wiriath <jules.wiriath@datadoghq.com> Co-authored-by: paullegranddc <paul.legranddescloizeaux@datadoghq.com> Co-authored-by: Gyuheon Oh <102937919+gyuheon0h@users.noreply.github.com>
Implement the portable libdatadog capability traits for the wasm runtime: an HTTP client backed by Node's http.request, a setTimeout-based sleep, and a response-header observer hook. This bundle is the generic parameter the data-pipeline and trace-exporter crates are instantiated with. Co-authored-by: Jules Wiriath <jules.wiriath@datadoghq.com> Co-authored-by: paullegranddc <paul.legranddescloizeaux@datadoghq.com> Co-authored-by: Gyuheon Oh <102937919+gyuheon0h@users.noreply.github.com>
Wasm binding over libdatadog's span-id-addressed change buffer: span creation and mutation via a binary op protocol, a deduplicated string table, segment (trace-level) attributes, batched export through prepareChunk/sendPreparedChunk, and optional client-side stats. Co-authored-by: Jules Wiriath <jules.wiriath@datadoghq.com> Co-authored-by: paullegranddc <paul.legranddescloizeaux@datadoghq.com> Co-authored-by: Gyuheon Oh <102937919+gyuheon0h@users.noreply.github.com>
Expose libdatadog's TraceExporter to JS via wasm-bindgen, pinned to the wasm capability bundle. The exporter is built lazily on first send, since the blocking build path is unavailable on wasm. Includes an integration test that drives it against an in-process mock agent. Co-authored-by: Jules Wiriath <jules.wiriath@datadoghq.com> Co-authored-by: paullegranddc <paul.legranddescloizeaux@datadoghq.com> Co-authored-by: Gyuheon Oh <102937919+gyuheon0h@users.noreply.github.com>
Update library_config, process_discovery, and datadog-js-zstd for the workspace's pinned Rust toolchain and libdatadog dependency versions. Co-authored-by: Jules Wiriath <jules.wiriath@datadoghq.com> Co-authored-by: paullegranddc <paul.legranddescloizeaux@datadoghq.com> Co-authored-by: Gyuheon Oh <102937919+gyuheon0h@users.noreply.github.com>
libdatadog's Span already carries meta_struct (VecMap<Text, Bytes>) and the exporter serializes it, but no JS binding existed, so structured per-span data (AppSec, Code Origin, Dynamic Instrumentation) could not be sent on the native path. There is no change-buffer opcode for meta_struct, so setMetaStruct writes the value directly onto the span via span_mut() after draining the change queue. meta_struct depends on no other queued op, so bypassing the queue ordering is safe \u2014 subsequent ops are applied on the next flush and never touch meta_struct. getMetaStruct mirrors the existing per-span getters for round-trip coverage.
The wasm HTTP transport only spoke http/https to a host:port, so a
`unix://` or `windows:` agent URL could not be reached: request() treated
the hex-encoded socket path (ddcommon's parse_uri stores it in the URI
authority) as a TCP host.
Detect the unix/windows scheme in request(), hex-decode the socket path
from the authority, and pass it to the JS transport, which now uses
Node's { socketPath } (covering Windows named pipes too). TCP requests
are unchanged; socket requests send a localhost Host header with no port.
Adds a unix-socket case to the transport tests.
Add `addSpanEvent` to append OpenTelemetry-style span events onto the top-level v0.4 `span_events` field that libdatadog already serializes. Like meta_struct there is no change-buffer opcode, so the event is appended directly to the span after draining the queue (span_events do not depend on any other queued op, so bypassing queue ordering is safe). Attributes arrive as a flat little-endian buffer with per-value type tags (String=0, Boolean=1, Integer=2, Double=3, Array=4) matching libdatadog's AttributeArrayValue discriminants; every read is bounded against the buffer so a malformed/truncated buffer errors instead of panicking. A `getSpanEventsJson` helper serializes events via the same serde impl used for the msgpack wire format, exercised by new round-trip tests covering each scalar type, arrays, and bounds.
Add setUseV05() to WasmSpanState so the single trace exporter can emit the v0.5 wire format (/v0.5/traces) instead of the default v0.4. The flag is read once, at the lazy exporter build on first send, then fixed; callers must set it before the first flush. v0.5 uses a fixed 12-field schema with no slots for meta_struct, span_events, or span_links, so libdatadog's v0.5 serializer silently drops them. This mirrors dd-trace-js master's v0.5 encoder and is intentional \u2014 there is no guard and no dual exporter. libdatadog does not downgrade V05 (unlike V1), so the caller (dd-trace-js) is responsible for only enabling this after the agent advertises /v0.5/traces via /info.
The pipeline crate's WasmSpanState builds its own internal TraceExporter and owns serialization + send (sendPreparedChunk), superseding the standalone trace_exporter binding (JsTraceExporter, the earlier pre-encoded-v0.4-bytes path). Nothing consumes it: dd-trace-js (native-spans and master) loads only the pipeline crate, and no other tracked code references it. Drop the crate, its wasm test, and the build/test wiring.
The branch's build-wasm dropped the leading `yarn -s install-wasm-pack` that main still runs, so a clean checkout without wasm-pack on PATH fails with `wasm-pack: not found` before building any module. Restore it.
flushStats() took the StatsCollector out of its RefCell for the whole async send, so a prepareChunk() during that await saw None and skipped add_spans — permanently dropping those successfully-sent spans from client-side stats. Split StatsCollector::flush into a synchronous prepare_request (drain + encode under a brief borrow) and an async send_request (no borrow). flushStats now builds the request, releases the collector, then awaits the send, so concurrent add_spans is counted.
…ndle struct
Bump the libdatadog branch=main git deps from the older pinned commit
(4b79b7ed) to the current main HEAD (a38b6304). Current main requires the
capability generic to implement LogWriterCapability in addition to
HttpClientCapability and SleepCapability.
Replace the `WasmCapabilities = DefaultHttpClient` type alias with a proper
bundle struct that mirrors libdatadog's native `NativeCapabilities { http,
sleep }`:
- rename `DefaultHttpClient` to `WasmHttpClient` (HTTP only) and move the
sleep impl into its own `WasmSleepCapability` (new `sleep` module),
matching libdatadog's `NativeHttpClient` / `NativeSleepCapability` split.
- `WasmCapabilities` now delegates HTTP and sleep to those fields and
implements `LogWriterCapability` as a no-op, since the wasm binding only
runs in trace-export mode.
Also pass the new `override_max_entries_per_bucket` argument (None = default)
to `SpanConcentrator::new`, which gained a parameter on main.
libdatadog's TraceExporter can export traces over OTLP HTTP (JSON or
protobuf) instead of to the Datadog agent, mapping its internal traces to
OTLP directly. Expose that on WasmSpanState so dd-trace-js can honour
OTEL_TRACES_EXPORTER=otlp without resurrecting a JS-side OTLP exporter.
- setOtlpEndpoint(url): route export to an OTLP HTTP endpoint (e.g. an OTel
Collector) instead of the agent.
- setOtlpProtocol('http/json'|'http/protobuf'): select the wire format;
rejects unsupported values (grpc) at the parse boundary.
- setOtlpHeaders([k, v, ...]): extra headers (e.g. collector auth).
All three apply at lazy build time and only when an OTLP endpoint is set.
Address review-until-green feedback on the OTLP binding: - Pin the default wire protocol: the endpoint-only OTLP test now asserts the request content-type is JSON (the http/json default) instead of accepting either json or protobuf. - Cover multi-header export and the odd-length trailing-drop in one test (two header pairs plus a stray unpaired element; assert both pairs arrive). - Document that setOtlpHeaders ignores a trailing unpaired element, replaces prior headers, and that setOtlpEndpoint takes precedence over setUseV05.
574428b to
b1edad5
Compare
This comment has been minimized.
This comment has been minimized.
The pipeline crate is wasm-only, but it was missing from the build-test-wasm
matrix, so its wasm module was never built in CI (nor shipped in releases),
and the native action-prebuildify test matrix crashed on
`require('..').maybeLoad('pipeline')` returning undefined.
- Add `pipeline` to the build-test-wasm matrix in build.yml and release.yml,
and make the composite action's test step crate-aware: pipeline runs its
top-level node:test suites with --test-force-exit, other crates keep using
test-wasm.js. This builds + tests the pipeline wasm in CI and includes it in
release prebuilds.
- test/pipeline.js now skips its suite when the wasm binding is unavailable
(the native test matrix) instead of throwing on the destructure.
- eslint: add test-file overrides (the node:test runner, the snake_case
http_transport.js name that must match the Rust wasm_bindgen module path,
null inputs in the wasm test harness, and describe-scoped test helpers) and
mechanical cleanups so `eslint .` passes across the new test files + shim.
- Cargo.lock: canonicalize the dual-source libdd-trace-protobuf entry left
inconsistent by the earlier rebase merge (no dependency changes).
…an exit The wasm exporter races a sleep(timeout) against each HTTP request as a timeout guard (and reuses it for retry backoff). On a fast success the timer is abandoned but, being reffed, kept the Node process alive for up to the request timeout (5 minutes) after the last flush. unref the timer: an in-flight request refs the event loop on its own, so an abandoned/standalone timeout timer must not block process exit. This also fixes the Node 18 CI test jobs: scripts/test.sh now probes for --test-force-exit (Node >= 20.14/22) and runs without it on Node 18, which rejects the flag as unknown. With the unref above, node:test exits cleanly on Node 18 regardless.
Node 18 lacks --test-force-exit (the wasm exporter keeps the event loop alive after a flush) and the wasm HTTP client leaves a mock-agent socket open that node:test cannot drain there, so test/pipeline.js cannot exit cleanly on Node 18. Skip it on a Node without --test-force-exit. The pipeline wasm is fully covered by the build-test-wasm job and the Node 20/22/24/26 runs here, so this loses no real coverage (it is platform/Node-independent wasm).
bb6694d to
bf9873f
Compare
Overall package sizeSelf size: 29.93 MB Dependency sizes| name | version | self size | total size | |------|---------|-----------|------------|🤖 This report was automatically generated by heaviest-objects-in-the-universe |
Address review feedback on the wasm binding: - get_num now bounds-checks internally and returns Option, so a truncated or malformed buffer yields a clean error at the call site instead of a panic (the check no longer relies on every call site). All five callers propagate it. - Latch a failed lazy build_async: building is one-shot and a build failure is fatal (bad config), so once it fails every send returns a distinguishable NativeExporterBuildError (a tagged JS Error) instead of a misleading 'exporter builder already consumed' string, letting the host stop retrying. - Avoid panicking unwrap()s: the response builder's headers_mut() (which could panic on an out-of-range agent status) is handled gracefully; the provably-safe unwraps (Vec::as_mut_ptr, Reflect::set on a fresh object) become documented expects.
…main v37.0.0 includes everything the wasm pipeline needs \u2014 the libdd-capabilities crate (with LogWriterCapability), the change-buffer feature, the capability-bundle TraceExporter, and the OTLP builder methods \u2014 so pin the released tag instead of tracking a moving main branch. This makes the build reproducible and is a prerequisite for publishing the npm package. SpanConcentrator::new gained a 5th argument (override_max_entries_per_bucket) on a later main commit that is not in v37.0.0; drop it and use the 4-arg form (we were passing its default, so behavior is unchanged).
szegedi
left a comment
There was a problem hiding this comment.
Whew. This is a biggie. Generally, solid work as far as I can tell. I was a bit in two minds about getting a more substantial amount of code (compared to very thin wrappers) appearing in libdatadog-nodejs, but ultimately convinced myself that the intersection of everything libdatadog and Node.js indeed should live here.
Of the things I flagged, I definitely need the "strip = none, debug = true" to be addressed in release mode, as well as the @napi-rs/cli dependency, the rest are nits.
| [dependencies] | ||
| anyhow = "1" | ||
| datadog-library-config = { git = "https://github.com/DataDog/libdatadog.git", tag = "v18.1.0" } | ||
| libdd-library-config = { git = "https://github.com/DataDog/libdatadog.git", rev = "353134770b312b7ccd2df6afabc253090b948e5f" } |
There was a problem hiding this comment.
Yes. Libdatadog did this everywhere.
| strip = true | ||
|
|
||
| # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html | ||
| # strip = "none" | ||
| debug = true |
There was a problem hiding this comment.
You're removing stripping and enabling debug in a release profile. I presume this is a testing leftover?
There was a problem hiding this comment.
Leftover, yep. Restored strip = true / dropped debug in fb34e63d.
| "access": "public" | ||
| }, | ||
| "dependencies": { | ||
| "@napi-rs/cli": "^3.6.0" |
There was a problem hiding this comment.
This is a built-time CLI tool, isn't it? Shouldn't this go into "devDependencies" instead? (Is it needed at all? What uses it?)
There was a problem hiding this comment.
Removed in fb34e63d \u2014 unused (build is cargo + wasm-pack), and main has no runtime deps.
| } | ||
| } | ||
|
|
||
| /// Parse response headers from a JS object `{ "header-name": "value", ... }`. |
There was a problem hiding this comment.
Given how the function takes an array of strings as a parameter, I presume this comment is wrong.
There was a problem hiding this comment.
Right, reworded in fb34e63d: it's Node's flat [name, value, ...] array, not an object.
| let mut span_ids = Vec::with_capacity(count as usize); | ||
| while count > 0 { | ||
| let span_id: u64 = get_num(chunk, &mut index) | ||
| .ok_or_else(|| JsValue::from_str("sendPreparedChunk: span id index out of bounds"))?; |
There was a problem hiding this comment.
nit: function is named prepare_chunk, so maybe update the error message.
There was a problem hiding this comment.
Fixed in fb34e63d \u2192 prepareChunk.
| /// `prepare_request` + `send_request`; callers that flush concurrently with | ||
| /// trace export should use those two directly so the collector isn't held | ||
| /// across the await (see `flushStats`). | ||
| pub async fn flush(&mut self, force: bool) -> Result<bool, String> { |
There was a problem hiding this comment.
looks unused to me – it's fine to keep it if you need it later.
There was a problem hiding this comment.
Removed in fb34e63d (dead since the prepare_request/send_request split).
| } | ||
|
|
||
| /// Update the agent URL (e.g. after reconfiguration). | ||
| pub fn set_agent_url(&mut self, url: String) { |
There was a problem hiding this comment.
This one looks unused too, again, no big deal if you intend to use it.
| use libdd_capabilities::http::{HttpClientCapability, HttpError}; | ||
| use libdd_capabilities::maybe_send::MaybeSend; | ||
|
|
||
| static WASM_MEMORY: LazyLock<JsValue> = LazyLock::new(wasm_bindgen::memory); |
There was a problem hiding this comment.
It's fun that this compiles (and since CI is green it must.) static in Rust implies Sync, and LazyLock: Sync would require JSValue to also be Sync which it… isn't? Anyhow, no action necessary as it compiles, but if I had more time to spend on this I'd love to understand why it works :-)
There was a problem hiding this comment.
wasm-bindgen impls Send/Sync for JsValue on wasm32 (single-threaded), and this crate is wasm32-only \u2014 so it's sound. Added a comment noting that in fb34e63d.
…ethods, docs Address szegedi's review on the wasm binding: - Restore the release profile to strip=true and drop debug=true (a debugging leftover; matters for the published package size, and matches main). With stripping restored, pass --all-features to wasm-opt for the pipeline crate ([package.metadata.wasm-pack]): its post-MVP features (bulk-memory for the change-buffer copies, sign-extension, etc.) otherwise trip wasm-pack's bundled wasm-opt validation (the smaller wasm crates don't hit this). - Remove the unused @napi-rs/cli runtime dependency \u2014 nothing invokes it and main builds the native prebuilds without it. - Remove the unused StatsCollector::flush and set_agent_url methods (dead since flush was split into prepare_request/send_request); clears the dead_code warning. - Fix the prepareChunk error message (was mislabeled sendPreparedChunk) and the parse_response_headers doc (it takes a flat [name, value, ...] array, not an object). - Document why the WASM_MEMORY LazyLock<JsValue> is Sync (wasm32-only).
fb34e63 to
3406c6e
Compare
Summary
Adds wasm-only native-spans support to
libdatadog-nodejs: the Rust/wasmbindings that let dd-trace-js run its span pipeline and trace export through
libdatadog instead of JS. Pairs with the companion dd-trace-js PR (DataDog/dd-trace-js#9139),
(
bengl/native-spans-attempt-3), which consumes these bindings.What's here (10 commits)
scripts/build-wasm.js).unix-socket / Windows named-pipe), sleep, etc., routing I/O back into JS.
WasmSpanState): change-buffer span pipeline (span_idprotocol), string table, stats collector, and the span-data bindings:
meta_struct(setMetaStruct/getMetaStruct)span_events(addSpanEvent, typed attributes)setUseV05) — single exporter; in v0.5 modemeta_struct/span_eventsare dropped by the protocol, by design.trace_exportercrate (JsTraceExporter, theearlier pre-encoded-bytes path): the
pipelinecrate'sWasmSpanStatebuilds its own internal
TraceExporterand supersedes it; nothing consumesthe standalone binding.
Testing
node --testsuites:pipeline(incl. meta_struct/span_events round-trips,v0.5 vs v0.4 endpoint routing against a mock agent, bounds/overflow
rejection),
http_transport(incl. a real AF_UNIX socket),trace_exporter,process_discovery,crashtracker.cargo check/clippyclean on the wasm target.landing.
Notes / caveats
target
wasm32-unknown-unknown.are gitignored / not published here).
Draft for review; not yet rebased for release.