Skip to content

Blocking DMV collector: layered wait thresholds + RESOURCE_SEMAPHORE; guard slicer/drill-down vs missing table#1235

Merged
erikdarlingdata merged 1 commit into
devfrom
feature/blocking-dmv-thresholds
Jun 25, 2026
Merged

Blocking DMV collector: layered wait thresholds + RESOURCE_SEMAPHORE; guard slicer/drill-down vs missing table#1235
erikdarlingdata merged 1 commit into
devfrom
feature/blocking-dmv-thresholds

Conversation

@erikdarlingdata

Copy link
Copy Markdown
Owner

What

Follow-up to the always-on DMV blocking fallback (#1227/#1230/#1233/#1234). Two things:

1. Layered wait-type thresholds + RESOURCE_SEMAPHORE (collector, both apps)

  • Added RESOURCE_SEMAPHORE% to the edge scope (covers both RESOURCE_SEMAPHORE and RESOURCE_SEMAPHORE_QUERY_COMPILE). Confirmed live that these waits do carry blocking_session_id (the memory-grant holders) -- they form real blocker->blocked edges, including the many-to-many case where one memory waiter is reported blocked by every current grant holder.

  • Layered minimum-wait floor to cut grid/chain noise by contention class:

    class floor
    LCK_* 2s
    PAGELATCH_* 0.5s
    PAGEIOLATCH_* 1s
    RESOURCE_SEMAPHORE(_QUERY_COMPILE) 5s

    THREADPOOL deliberately excluded -- no blocking_session_id, and unobservable during real worker starvation.

2. Guard the slicer + flat drill-down against a missing DMV table (regression fix)

The slicer (DatabaseService.QueryPerformance.cs) and flat drill-down (SqlServerDrillDownCollector.Blocking.cs) inline collect.dmv_blocking_snapshots in a single combined CTE/UNION, which fails to compile (Msg 208) on a not-yet-upgraded server -- so the slicer blanked and the drill-down errored. A runtime try/catch can't rescue a single combined batch (unlike the already-guarded pair-row + grid paths, which fetch DMV separately). New BlockingPairRowQuery.DmvSnapshotsTableExistsAsync probe lets both surfaces drop the DMV branch entirely when the table is absent, degrading to BPR-only.

Schema

collect.dmv_blocking_snapshots.lock_mode widened nvarchar(20) -> nvarchar(64) (install/02 + install/06) so the RESOURCE_SEMAPHORE_QUERY_COMPILE tag (32 chars) fits whole. Lite's DuckDB VARCHAR is unbounded (no change).

Testing

  • Both apps build clean (0 errors).
  • Dashboard 652 / Lite 578 / Installer 80 -- all pass.
  • install/56 PARSEONLY-clean.

Not in this PR (held)

  • Live apply to SQL2025 (ALTER column + redeploy proc) + Lite rebuild/relaunch -- deferred so as not to perturb the HammerDB validation / evict the running Lite.
  • Proper diagnosis of the pre-existing CollectBlockingChainFactsAsync load-error (capture the real exception, not a blind timeout bump).

🤖 Generated with Claude Code

… guard slicer/drill-down vs missing table

Collector (install/56 + Lite RemoteCollectorService.DmvBlockingSnapshot.cs):
- Add RESOURCE_SEMAPHORE% (covers RESOURCE_SEMAPHORE and RESOURCE_SEMAPHORE_QUERY_COMPILE)
  to the edge scope. Verified live that these waits DO carry blocking_session_id (the grant
  holders), so they form real blocker->blocked edges and pass the existing filter.
- Layered minimum-wait floor to cut grid/chain noise by contention class:
  LCK 2s, PAGELATCH 0.5s, PAGEIOLATCH 1s, RESOURCE_SEMAPHORE(+_QUERY_COMPILE) 5s.

Schema (install/02 + install/06):
- Widen collect.dmv_blocking_snapshots.lock_mode nvarchar(20) -> nvarchar(64) so the
  RESOURCE_SEMAPHORE_QUERY_COMPILE tag (32 chars) fits whole. Lite DuckDB VARCHAR unbounded.

Guard (regression fix):
- New BlockingPairRowQuery.DmvSnapshotsTableExistsAsync probe. The slicer
  (DatabaseService.QueryPerformance.cs) and flat drill-down (SqlServerDrillDownCollector.Blocking.cs)
  inline the DMV table in a single combined CTE/UNION, which fails to COMPILE (Msg 208) on a
  not-yet-upgraded server -- a runtime catch can't rescue one batch. They now drop the DMV branch
  when the table is absent, degrading to BPR-only instead of blanking the slicer / erroring the drill-down.

Both apps build clean; Dashboard 652 / Lite 578 / Installer 80 tests pass; install/56 PARSEONLY-clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@erikdarlingdata erikdarlingdata merged commit b13b660 into dev Jun 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant