Update libmoq to 0.3.6; rework MoQ source teardown to fix use-after-free#44
Conversation
Bring the file into compliance with the repo's clang-format@19 config so the format check passes on subsequent changes. Pure formatting, no behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
moq_source_destroy freed ctx after a fixed os_sleep_ms(100) that was meant to let in-flight async MoQ callbacks drain (they check shutting_down and exit early). That is a timing guess, not synchronization: the decode callback runs FFmpeg avcodec_send_packet/receive_frame, which can exceed 100ms on a large keyframe or slow/contended hardware, so a callback could touch ctx after it was freed. The code documented this limitation itself. Replace the sleep with real quiescence: - Add an in-flight callback counter (active_callbacks) and a condition variable (callbacks_done) to moq_source. - A callback_guard RAII type increments the counter on callback entry (unless shutdown has begun) and broadcasts when it returns to zero. Each top-level callback (on_session_status, on_catalog, on_video_frame) holds one for its whole stack frame, so ctx cannot be freed while a callback that may touch it is still running. - destroy sets shutting_down, disconnects, then waits on callbacks_done until the count is zero - all under the mutex so the handoff is atomic. Also drop the os_sleep_ms(50) in the reconnect path. libmoq origins and sessions are fully independent (each origin is a distinct random instance, each session its own task): moq_origin_close removes the origin synchronously and moq_session_close only signals the old session's task to wind down on the libmoq runtime thread. The new origin/session share no client-side state with the closed one, so the delay guarded nothing - it was pure latency. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Caution Review failedPull request was closed or merged during review Walkthrough
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/moq-source.cpp (1)
418-427:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winReject catalog results if shutdown/disconnect happened during setup.
The Line 368 shutdown check is stale by the time the callback reaches Line 418. If destroy or disconnect happened while video config/decoder/track setup ran outside the mutex, Line 419 can still publish
trackandcatalogbased only on generation. Also require!shutting_downand a still-valid consume handle before storing these handles.Proposed final-state check
pthread_mutex_lock(&ctx->mutex); - if (ctx->generation == current_gen) { + if (!ctx->shutting_down.load() && ctx->consume >= 0 && ctx->generation == current_gen) { ctx->video_track = track; ctx->catalog_handle = catalog; } else { // Generation changed while we were setting up, clean up the track pthread_mutex_unlock(&ctx->mutex);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/moq-source.cpp` around lines 418 - 427, The generation check on line 419 (ctx->generation == current_gen) is insufficient to prevent storing invalid handles if shutdown or disconnect occurred during setup. Add additional checks to the condition that stores ctx->video_track and ctx->catalog_handle to verify that shutting_down is false and the consume handle is still valid, not just relying on generation equality. This ensures stale results from a shutdown that occurred after the initial check at line 368 will be rejected before the assignments.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/moq-source.cpp`:
- Around line 112-118: The callback_guard constructor has a race condition where
a callback can start and block on pthread_mutex_lock before incrementing
active_callbacks, becoming invisible to destroy() which can then free ctx while
the callback is still waiting on the mutex. Restructure the callback_guard
constructor to register the callback in a separate pre-lock lifetime counter
before attempting to acquire the mutex, then acquire the mutex and check the
shutting_down flag to determine if the callback should proceed or be rejected.
This ensures that destroy() at Line 222 will see all active callbacks even if
they are blocked on the mutex, preventing use-after-free scenarios.
- Around line 213-224: After the while loop that waits for active_callbacks to
reach zero completes, new resources (decoders/tracks) may have been created by
in-flight callbacks that checked the initial shutdown state but continued
executing. To prevent use-after-free, call moq_source_disconnect_locked(ctx)
again after the pthread_cond_wait loop completes but before unlocking the mutex
with pthread_mutex_unlock. This ensures any newly installed resources during the
callback drain period are properly cleaned up before ctx is freed.
---
Outside diff comments:
In `@src/moq-source.cpp`:
- Around line 418-427: The generation check on line 419 (ctx->generation ==
current_gen) is insufficient to prevent storing invalid handles if shutdown or
disconnect occurred during setup. Add additional checks to the condition that
stores ctx->video_track and ctx->catalog_handle to verify that shutting_down is
false and the consume handle is still valid, not just relying on generation
equality. This ensures stale results from a shutdown that occurred after the
initial check at line 368 will be rejected before the assignments.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
Bumps MOQ_VERSION 0.2.14 -> 0.3.6 (latest released; 0.3.7 is tagged but has no published release artifacts yet). The fetched API is signature-compatible for everything the plugin uses, and the moq_video_config / moq_frame structs are unchanged - the only breaking change is semantic: the session on_status codes were redefined in 0.3.0. Old (0.2.x): 0 = connected, non-zero = failure. New (0.3.x): > 0 = (re)connected carrying the connection epoch (1 = first connect, 2 = first reconnect, ...); 0 = closed cleanly (terminal); < 0 = fatal / reconnect permanently gave up (terminal). Update both session status callbacks accordingly: - moq-source.cpp on_session_status: treat > 0 as connected (was treating the new epoch-1 connect as a failure, which would have broken connecting entirely against 0.3.x). Start consuming only on the first connect; on later epochs libmoq re-subscribes our existing consumer automatically since the origin outlives the connection, so recreating it would leak handles. Handle 0 (clean close) and < 0 (error) as the terminal cases. - moq-output.cpp: same > 0 / <= 0 split for its informational connect log. A nice side effect: 0.3.x sessions auto-reconnect internally with backoff, so transient transport drops now recover without the plugin's manual reconnect (which remains for settings changes). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/moq-output.cpp (1)
85-89: 💤 Low valueConsider resetting
connect_starton reconnect for accurate timing.When libmoq auto-reconnects (epoch > 1), the elapsed time is computed from the original
connect_startset at line 76, not from when the reconnect attempt began. The log message "MoQ session connected (%d ms, epoch %d)" for a reconnect will therefore show total uptime rather than reconnect duration, which may be confusing.♻️ Optional refactor to reset timer on each connect attempt
if (code > 0) { auto elapsed = std::chrono::steady_clock::now() - self->connect_start; self->connect_time_ms = static_cast<int>( std::chrono::duration_cast<std::chrono::milliseconds>(elapsed).count()); LOG_INFO("MoQ session connected (%d ms, epoch %d): %s", self->connect_time_ms, code, self->server_url.c_str()); } else { + self->connect_start = std::chrono::steady_clock::now(); self->connect_time_ms = 0; LOG_INFO("MoQ session closed (%d): %s", code, self->server_url.c_str()); }This way each reconnect logs the time it took to reconnect, not the total uptime.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/moq-output.cpp` around lines 85 - 89, The connect_start timer is only set once at the initial connection but is reused for all reconnect attempts, causing the elapsed time calculation to measure total uptime rather than the duration of each individual reconnect. To fix this, reset the connect_start variable whenever a reconnect attempt begins (when epoch > 1), so that the elapsed time calculated in the log message reflects only the duration of the current connection attempt rather than the total time since the original connection was initiated.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/moq-output.cpp`:
- Around line 84-92: The connect_time_ms field is not being reset when the
session closes (code <= 0), which causes the UI to incorrectly display the
connection as active based on checking if connect_time_ms is greater than zero.
In the else block that handles the session closed condition (where code <= 0),
add a statement to reset self->connect_time_ms to zero before or after the
LOG_INFO call. This ensures the UI will correctly reflect that the connection is
no longer active.
- Around line 81-92: Add thread-safety synchronization to the
session_connect_callback function to protect access to connect_time_ms. Use a
mutex similar to the pattern in moq-source.cpp (pthread_mutex_lock pattern).
Lock the mutex in session_connect_callback before writing to connect_time_ms and
connect_start, and unlock it after the write. Also add the same mutex lock
protection in the GetConnectTime() method when reading connect_time_ms to
prevent data races between the callback (which may run on a worker thread) and
external callers (which may run on the UI thread).
---
Nitpick comments:
In `@src/moq-output.cpp`:
- Around line 85-89: The connect_start timer is only set once at the initial
connection but is reused for all reconnect attempts, causing the elapsed time
calculation to measure total uptime rather than the duration of each individual
reconnect. To fix this, reset the connect_start variable whenever a reconnect
attempt begins (when epoch > 1), so that the elapsed time calculated in the log
message reflects only the duration of the current connection attempt rather than
the total time since the original connection was initiated.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 74562abf-1720-4ce4-a8ce-5d3a8b9eb265
📒 Files selected for processing (3)
CMakePresets.jsonsrc/moq-output.cppsrc/moq-source.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
- src/moq-source.cpp
| if (code > 0) { | ||
| auto elapsed = std::chrono::steady_clock::now() - self->connect_start; | ||
| self->connect_time_ms = static_cast<int>(std::chrono::duration_cast<std::chrono::milliseconds>(elapsed).count()); | ||
| LOG_INFO("MoQ session established (%d ms): %s", self->connect_time_ms, | ||
| self->connect_time_ms = static_cast<int>( | ||
| std::chrono::duration_cast<std::chrono::milliseconds>(elapsed).count()); | ||
| LOG_INFO("MoQ session connected (%d ms, epoch %d): %s", self->connect_time_ms, code, | ||
| self->server_url.c_str()); | ||
| } else { | ||
| LOG_INFO("MoQ session closed (%d): %s", error_code, self->server_url.c_str()); | ||
| LOG_INFO("MoQ session closed (%d): %s", code, self->server_url.c_str()); | ||
| } |
There was a problem hiding this comment.
Critical: Reset connect_time_ms to zero when the session closes.
When code <= 0 (session closed or failed), the callback does not reset connect_time_ms to zero. The UI determines connection state by checking obs_output_get_connect_time_ms(output) > 0 (see moq-dock.cpp:326), so leaving connect_time_ms at its previous positive value will cause the UI to incorrectly display "● Connected" even after the session has closed.
🐛 Proposed fix to reset connection time on close
if (code > 0) {
auto elapsed = std::chrono::steady_clock::now() - self->connect_start;
self->connect_time_ms = static_cast<int>(
std::chrono::duration_cast<std::chrono::milliseconds>(elapsed).count());
LOG_INFO("MoQ session connected (%d ms, epoch %d): %s", self->connect_time_ms, code,
self->server_url.c_str());
} else {
+ self->connect_time_ms = 0;
LOG_INFO("MoQ session closed (%d): %s", code, self->server_url.c_str());
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (code > 0) { | |
| auto elapsed = std::chrono::steady_clock::now() - self->connect_start; | |
| self->connect_time_ms = static_cast<int>(std::chrono::duration_cast<std::chrono::milliseconds>(elapsed).count()); | |
| LOG_INFO("MoQ session established (%d ms): %s", self->connect_time_ms, | |
| self->connect_time_ms = static_cast<int>( | |
| std::chrono::duration_cast<std::chrono::milliseconds>(elapsed).count()); | |
| LOG_INFO("MoQ session connected (%d ms, epoch %d): %s", self->connect_time_ms, code, | |
| self->server_url.c_str()); | |
| } else { | |
| LOG_INFO("MoQ session closed (%d): %s", error_code, self->server_url.c_str()); | |
| LOG_INFO("MoQ session closed (%d): %s", code, self->server_url.c_str()); | |
| } | |
| if (code > 0) { | |
| auto elapsed = std::chrono::steady_clock::now() - self->connect_start; | |
| self->connect_time_ms = static_cast<int>( | |
| std::chrono::duration_cast<std::chrono::milliseconds>(elapsed).count()); | |
| LOG_INFO("MoQ session connected (%d ms, epoch %d): %s", self->connect_time_ms, code, | |
| self->server_url.c_str()); | |
| } else { | |
| self->connect_time_ms = 0; | |
| LOG_INFO("MoQ session closed (%d): %s", code, self->server_url.c_str()); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/moq-output.cpp` around lines 84 - 92, The connect_time_ms field is not
being reset when the session closes (code <= 0), which causes the UI to
incorrectly display the connection as active based on checking if
connect_time_ms is greater than zero. In the else block that handles the session
closed condition (where code <= 0), add a statement to reset
self->connect_time_ms to zero before or after the LOG_INFO call. This ensures
the UI will correctly reflect that the connection is no longer active.
The previous teardown counted in-flight callback *executions* and waited for that count to hit zero. That closes the window where a callback is mid-execution when destroy runs, but not the window where libmoq delivers a callback *after* the count reached zero: each subscription fires one terminal callback (status <= 0) asynchronously after it is closed, which could land after destroy had already freed ctx. libmoq >= 0.3.0 documents a terminal-callback lifetime contract: user_data must stay valid until each subscription's terminal callback. Rework teardown to honor it directly via subscription reference counting: - ctx holds one reference for the OBS-owned source plus one per outstanding subscription handed `ctx` as user_data (session, catalog, video track). The ref is pre-incremented before the libmoq registration call (undone if it fails) and released by that subscription's terminal callback via the subscription_ref RAII guard. Because libmoq runs all callbacks serially on one runtime thread, the ref is held continuously from registration through the terminal callback, so ctx is always valid inside any callback. - destroy sets shutting_down, closes every subscription (which makes each terminal fire promptly - libmoq's biased close path wins over pending updates), drops the owner ref, and waits for the count to reach zero. A generous bounded wait backstops a subscription that never terminates so a broken teardown degrades to a logged warning instead of hanging OBS. Along the way this fixes a latent handle-management bug: the catalog *snapshot* id delivered to on_catalog was being stored and closed with moq_consume_catalog_close (which expects a *subscription* id from a different slab - an id-collision hazard). Now the real subscription handle from moq_consume_catalog is stored and closed, snapshots are freed with moq_consume_catalog_free on every path, and a catalog update closes the previous video track instead of leaking it. Also: a fatal session error now tears down all subscriptions (not just session+origin) so their terminals fire promptly, and reconnect bails if shutting down. The refcount balance and deadlock-freedom were checked by two independent adversarial reviews. Still wants runtime validation: connect/disconnect, scene-switch teardown with a decode in flight, settings-change reconnect, and repeated catalog updates. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Homebrew now refuses to run executables from an untrusted third-party tap,
so the format check's `brew install obsproject/tools/clang-format@19` was
skipped ("Skipping obsproject/tools because it is not trusted") and the step
failed with a broken pipe before any file was checked. Explicitly tap and
`brew trust obsproject/tools` first.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Bring the header into compliance with the repo's clang-format@19 config so the format check passes once the file is touched. Pure formatting, no behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Code review surfaced that the source-side teardown rework left the sibling MoQOutput with the same class of bug the 0.3.x bump makes live: - Use-after-free: ~MoQOutput -> Stop() calls moq_session_close, which only signals; libmoq delivers the session's terminal status callback (code <= 0) asynchronously on its runtime thread afterward and dereferences `self` (server_url, connect_time_ms, connect_start). OBS deletes the object as soon as ~MoQOutput returns, so the late callback touched freed memory. 0.3.x's auto-reconnect also fires the callback repeatedly mid-stream, widening the window. Fix by mirroring the source's pattern: count outstanding session subscriptions (pre-incremented before moq_session_connect, released by the terminal callback) and have the destructor wait for the count to reach zero, with the same bounded backstop so a missing terminal warns instead of hanging. - Data race: connect_time_ms is written by the callback (runtime thread) and read by GetConnectTime() (OBS thread); make it std::atomic<int>. Also (source): restore the blank-video-on-catalog-error behavior that the terminal-callback refactor dropped, so an invalid/failed broadcast still clears the preview (except during our own teardown). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Summary
Updates the source plugin to libmoq 0.3.6 and reworks its teardown to eliminate a use-after-free, building on libmoq's terminal-callback lifetime contract.
1. libmoq
0.2.14→0.3.60.3.7 is tagged but unreleased (no artifacts), so 0.3.6 is the latest the plugin's
FetchContentcan pull. The API is signature-compatible for everything the plugin uses; the only breaking change is semantic — the sessionon_statuscodes were redefined in 0.3.0:> 00< 0Both session-status callbacks (
on_session_statusin the source, the connect log inMoQOutput) are updated. Without this the plugin would treat the new epoch-1 connect as a failure and never connect. Bonus: 0.3.x auto-reconnects internally, so transient transport drops now recover for free.2. Teardown use-after-free → terminal-callback refcount
The original
moq_source_destroyfreedctxafter a fixedos_sleep_ms(100)hoping async callbacks had drained (see moq-dev/moq#1791). The decode callback runs FFmpeg and can exceed 100ms, soctxcould be touched after free.The fix honors libmoq's documented contract directly: each subscription handed
ctxasuser_data(session, catalog, video track) delivers exactly one terminal callback (status<= 0) after whichuser_datais never touched again.ctxholds a reference count: one for the OBS-owned source, plus one per outstanding subscription. Each subscription ref is pre-incremented before its libmoq registration call (undone if registration fails) and released by its terminal callback via asubscription_refRAII guard. Because libmoq runs all callbacks serially on one runtime thread, the ref is held continuously from registration through the terminal callback — soctxis valid inside every callback, with no entry-time guards needed.destroysetsshutting_down, closes every subscription (libmoq's biased close path makes each terminal fire promptly), drops the owner ref, and waits for the count to reach zero. A generous bounded wait backstops a never-terminating subscription so a broken teardown degrades to a logged warning instead of hanging OBS — in normal operation terminals arrive within milliseconds.This is strictly more correct than the intermediate in-flight-execution counter (an earlier commit in this PR), which couldn't close the window where libmoq delivers a terminal callback after the in-flight count reached zero.
Latent bugs fixed along the way
on_catalogwas being stored and closed withmoq_consume_catalog_close, which expects a subscription id from a different slab — an id-collision hazard. Now the real subscription handle (frommoq_consume_catalog) is stored/closed, and snapshots are freed withmoq_consume_catalog_freeon every path (also fixing a snapshot leak).3. Reconnect sleep
Removed the
os_sleep_ms(50)before reconnecting — libmoq origins/sessions are fully independent, so it guarded nothing (verified against libmoq internals).Commits
moq-source.cpp— it was committed non-compliant with the repo'sclang-format@19, which gates changed files; split out so the rest reviews cleanly.Testing
RelWithDebInfo, code-signed).clang-format@19(obsproject/toolsbuild, matching CI).🤖 Generated with Claude Code