Release v3.1.0 by erikdarlingdata · Pull Request #1250 · erikdarlingdata/PerformanceMonitor

erikdarlingdata · 2026-06-28T19:53:25Z

Release v3.1.0. Full notes in CHANGELOG.md.

Highlights

Shared block-chain + deadlock-graph viewers across both apps (Block-chain viewer (PR #1): shared apex->victim chain visualizer in both apps #1207, Deadlock graph viewer: cycle view of the waits-for graph (both apps) #1216)
Always-on DMV blocking-snapshot fallback so blocking stays visible without the blocked-process report (Add always-on DMV blocking-snapshot fallback (both apps) #1227)
FinOps storage -> object -> index drill with in-grid heatmaps (Redesign index/object-stats FinOps feature: drill-downs from Database Sizes + locks, plus a contention heatmap #1138)
Recommendation/advice engine rebuilt to compose from your server's own collected facts (Advice engine: compose all 56 blocks from facts; strip tool-names/hedging from human prose #1244)
"Collection Stopped" alert when collectors are disabled (Add "Collection Stopped" alert: warn when collector Agent jobs are disabled #1246); Collection Health now counts SKIPPED as healthy (Collection Health: count SKIPPED as a healthy run (no false STALE for dedup collectors) #1248)
Incident grouping + dedup fingerprints; per-event and per-server alert delivery ([FEATURE] Add a stable, machine‑readable dedup fingerprint (and involved‑object list) to alert payloads - esp. deadlocks & blocking #1140, [FEATURE] Add a "per‑event" notification option (vs. the batched per‑cycle summary) for deadlocks & blocking #1141, [FEATURE] Per-server override for alert notification mode (per-event vs summary) #1236)
Empty Overview Blocking/Deadlocking lane renders as a live grid (Overview lanes: render empty Blocking/Deadlocking lane as a live 0-1 grid #1245)

Testing: installer fresh/upgrade/multi-hop/uninstall (embedded + CLI), data-survival, Azure SQL DB + AWS RDS cloud, embedded-resource upgrade discovery (#772 guard), nightly -- all green.

WARNING: Release cut -- do not merge until approved. Head is dev (required by check-pr-branch.yml). On merge: tag v3.1.0 + publish GitHub Release (triggers SignPath signing).

🤖 Generated with Claude Code

…ing sweep The v3.0.0 collector ran its entire multi-database sweep as ONE SqlCommand under the global 30s CommandTimeoutSeconds, cursoring every online database into a #temp and returning a single final SELECT. Because nothing streamed back until the end, the 30s was a cumulative, all-or-nothing budget across every database: on larger estates the sweep exceeded 30s, failed with SQL #-2 (Execution Timeout Expired), and discarded results from EVERY database, not just the slow one. Enabled by default and "never-run = due immediately," it failed on first connect after upgrade and kept retrying the timeout. Lite now collects one command per database, mirroring CollectQueryStoreAsync: - On-prem enumerates online/accessible databases into a list on one connection, then runs each via [db].sys.sp_executesql with its own command, a dedicated 300s timeout, and per-database try/catch. Azure SQL DB connects to each database individually. A slow or inaccessible database now fails only itself; the rest still persist. - Within each database the three DMVs (dm_db_partition_stats, dm_db_index_usage_stats, dm_db_index_operational_stats) are staged into #temp tables with single scans and then joined, giving the optimizer real cardinality instead of the bad plans the old monolithic multi-DMV join produced on large databases (the sp_IndexCleanup technique). - Dedicated 300s timeout (matching the FinOps sp_IndexCleanup path) replaces the 30s meant for lightweight DMV reads. The Dashboard's equivalent SQL collector (install/55) was not subject to the bug (it runs under SQL Agent and persists per database), but is brought to parity with the same DMV-staging technique for plan quality on large databases. Validated against SQL Server 2022: install proc collected 585 rows across 12 databases; the Lite [db].sys.sp_executesql + temp-staging wrapper returns rows in the correct database context. Lite build + 447 tests pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ct-stats Fix #1135: Lite index_object_stats collector times out as all-or-nothing sweep

The low-disk "Volume Free Space" alert was absent from the severity map and fell through to INFO for every breach, under-prioritizing a condition that can take a database into recovery/suspect and mis-routing severity-based webhooks. It now renders WARNING for a normal breach and CRITICAL when the worst breached volume is critically low (<=3% free or <=2GB free), via a shared LowDiskAlertGate.IsCriticallyLow rule and an AlertContext.SeverityOverride that rides through the email badge, Teams card, and Slack sidebar. The metric name is unchanged, so mute rules, cooldowns, and Alert-History matching are untouched. Fixed identically in Lite and Dashboard. Covered by AlertSeverityTests and LowDiskAlertGateTests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…e-space-severity Grade Volume Free Space alert severity (WARNING/CRITICAL) (#1136)

…, persistence) Adds the app-agnostic core for the alert dedup fingerprint + involved-objects feature, with no app wiring yet: - AlertIncident record + AlertContext.Incidents (the per-incident unit). - AlertFingerprint: ForObjects/ForKey/Hash. SHA-256 idiom reused from InferenceEngine; server+incident-type scoped; case/order/whitespace-insensitive; volatile per-sample fields excluded; original casing preserved for display. - AlertIncidentRenderer: projects Incidents into AlertContext.Details so the fingerprint renders on Teams/Slack/email(x2)/dialog with no renderer changes. - AlertContextSerializer: persists Incidents (trailing-optional DTO, backward- compatible round-trip). 19 unit tests (Lite.Tests): fingerprint determinism/order/case/scoping/volatile- exclusion, renderer projection across surfaces, serializer round-trip + legacy null. Refs #1140 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Extracts the grouping + fingerprint logic out of the (untestable) WPF builders into shared, unit-tested helpers both apps' live builders will call, so grouping/fingerprint are identical across Lite and Dashboard: - BlockingIncidentGrouper: collapses blocked-process samples that are one chain into a single incident with the true occurrence count + wait range (fixes gotqn's "same chain shown 3x, count says 8"). Identity = resolved contentious object, else database + literal-stripped blocked/blocking query pair. - DeadlockIncidentGrouper: groups deadlocks by sorted involved-object set (multi-DB deadlock = one incident; recurrences collapse with a count). - DeadlockObjectExtractor: pulls db.schema.object names from a deadlock graph resource-list (Lite's source; Dashboard already has DeadlockItem.ObjectNames). 10 unit tests covering chain collapse, literal-varying grouping, object-vs-query-pair identity, multi-DB deadlock, order-independence, and XML extraction. Refs #1140 #1141 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Wires the shared groupers/fingerprint into the live "Detected"/threshold builders in Lite (the path consumers actually receive): - BuildBlockingContextAsync: groups blocked-process samples via BlockingIncidentGrouper so one chain shows once with its true occurrence count + wait range (was listed once per sample, capped at 3 while the count said more), and surfaces "+N more" instead of silently dropping. Attaches the dedup fingerprint. (Object identity arrives once the Lite collector resolves contentious_object, plan §5.3; falls back to db+query-pair now.) - BuildDeadlockContextAsync: fingerprints by involved-object set parsed from the deadlock graph (DeadlockObjectExtractor) across ALL deadlocks in the window, grouped + counted. - BuildVolumeFreeSpaceContext / BuildAnomalousJobContext: per-volume / per-job dedup key. serverName threaded into all four builders (the fingerprint scopes on it). Lite builds clean, 0 warnings. LRQ builder still pending its query_hash collection change (§5.2). Refs #1140 #1141 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ity with Lite) Mirrors the Lite wiring in the Dashboard's live builders so both apps emit identical fingerprints: - BuildBlockingContextAsync: dedup by the resolved contentious_object (already produced by sp_HumanEventsBlockViewer and surfaced on BlockingEventItem) via the shared grouper. - BuildDeadlockContextAsync: dedup by involved-object set parsed from the deadlock graph with the same shared DeadlockObjectExtractor Lite uses. - BuildLongRunningQueryContext: dedup key = query_hash, newly captured from sys.dm_exec_requests (CONVERT(varchar(18), r.query_hash, 1)) into LongRunningQueryInfo. - BuildVolumeFreeSpaceContext / BuildAnomalousJobContext: per-drive / per-job key. serverName threaded into all five builders + their five call sites. Dashboard needed no schema change (object_names + contentious_object already collected). Builds clean, 0 warnings. Refs #1140 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ry_hash Completes the Lite live-path parity by collecting the two identity fields Lite was missing (validated against SQL 2022): - blocked_process_reports: capture the blocked_process_report event's own object_id/database_id and resolve contentious_object server-side in the collection query, mirroring sp_HumanEventsBlockViewer EXACTLY (2-part schema.object + identical 'Unresolved: ...' fallback) so the fingerprint matches the Dashboard for the same object. New columns added at the end of the table + appender; v30 migration (ALTER ADD COLUMN); v_ views union BY NAME so old parquet reads back NULL. BuildBlockingContextAsync now uses the resolved object as the identity. - query_snapshots: capture query_hash (CONVERT(varchar(18), query_hash, 1)) in both the on-prem and Azure (#req) snapshot queries; surface it through GetLongRunningQueriesAsync; the Lite LRQ builder now emits a query_hash dedup key. Schema v29 -> v30. Lite builds clean, 0 warnings. Refs #1140 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…p-fingerprint #1140/#1141: stable dedup fingerprint + involved-objects on alert payloads

…oth apps) Completes #1140 by giving the secondary anomaly path (ANOMALY_*_SPIKE / CPU findings via AnalysisNotificationService) the same fingerprints as the live "Detected" path: - DrillDownCollector (both apps): top_deadlocks now carries the involved objects (parsed from the deadlock graph via the shared DeadlockObjectExtractor — raw XML NOT surfaced), and top_blocking_chains now carries contentious_object. Source columns already existed. - FindingMessageFormatter.BuildContext (shared): derives context.Incidents from the drill-down — deadlock -> involved-object set, blocking -> contentious object / query pair, query/CPU -> distinct query_hash — reusing the same shared groupers/fingerprint as the live builders, so either path produces an identical key. Incidents are appended after the detail items (existing Diagnosis->Advice->drill-down order preserved). Shared code, so Lite/Dashboard parity is automatic. +4 finding-path tests; 1 existing count-based test updated (its top_cpu_queries drill-down now yields 2 query incidents). Lite 492 + Dashboard 487 tests green; both apps build 0-warnings. Refs #1140 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…th-incidents #1140: dedup fingerprints on the anomaly/finding alert path (both apps)

…al, both apps) Adds an opt-in Per-event delivery mode (default stays Summary) that sends one notification per distinct incident instead of the batched per-cycle card, so downstream automation can open/track one ticket per incident and count recurrences via the #1140 fingerprint. - PerEventNotification.Split (shared): one message per incident, capped at the configured max-per-cycle, with a trailing "+N more" message that still carries the remaining fingerprints so none are silently dropped. Recurrence handling is left to the existing edge-triggered gating + the consumer's fingerprint dedup. - Settings: AlertDeliveryMode (Summary|PerEvent) + AlertPerEventMaxPerCycle (default 10) in both apps (Lite App statics + JSON; Dashboard UserPreferences), with load/save/reset. - Settings UI: a delivery-mode dropdown + per-cycle cap in both SettingsWindows. - Firing: a SendDetectedAlertAsync helper in each MainWindow routes the "Blocking Detected" and "Deadlocks Detected" sends through Per-event when enabled; alert-history recording is unchanged (one row per fire). Scope: GLOBAL setting (per-server override is a tracked fast-follow). 5 unit tests for the split helper. Lite 497 + Dashboard 487 tests green; both apps build 0-warnings. Refs #1141 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…notifications #1141: per-event notification mode for deadlock/blocking alerts (global)

…tqn feedback) Addresses gotqn's two findings from testing the dev build (the #1140 fingerprint itself tested great — stable key + climbing Occurrences): 1. Per-event cards no longer carry LESS detail than Summary. AlertIncident now carries transient DetailFields (forensic facts), populated by the groupers from the representative event: blocking -> Database / Contentious Object / Blocked Query / Blocking Query / Lock Mode; deadlock -> Victim SQL / Processes (Lite), Query / Wait Resource / Lock Mode (Dashboard). PerEventNotification.Split renders them onto each per-incident card and now also carries the source AttachmentXml/FileName so per-event email keeps the deadlock_graph.xml / blocked_process_report.xml. Summary rendering is untouched (AlertIncidentRenderer.Apply leaves DetailFields off to avoid duplicating the builder's own items), and DetailFields are not persisted. 2. Per-event "Current Value" is now the occurrence count (a number, matching Summary), not the involved-objects string (which already shows as its own fact). Lite 500 + Dashboard 487 tests green (3 new: detail preserved, attachment carried, Current Value = count); both apps build 0 warnings. Refs #1141 #1140 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…detail #1141: per-event cards keep forensic detail + numeric Current Value (gotqn feedback)

#981 added restart-dedup for the email channel only; a restart cleared the two guards that suppress a webhook re-send, so reopening Lite re-posted a Teams/Slack alert already delivered before the restart (identical Dedup Key and Occurrences). Two-part fix: 1. Webhook cooldown seed (shared, BOTH apps). WebhookAlertService now seeds its per-(serverId, metricName) cooldown from alert history on first use, mirroring the email seed, via a new IAlertHistoryStore.GetLastWebhookSentUtcAsync. Lite filters notification_type IN ('webhook','email+webhook'); Dashboard filters NotificationType == "webhook". send_error is NOT filtered on -- it tracks the email channel, so an email-failed-but-webhook-sent row must still seed. Wired into the WebhookAlertService DI in both MainWindows. 2. Edge-trigger watermark persistence (Lite). The rolling-count gate's in-memory watermark (#1091) reset to 0 on restart, so the first sweep re-fired for events still in the 1-hour lookback -- and because that gap can exceed the cooldown, the seed alone (time-bounded) does not cover it. The watermark now persists to a new config_edge_trigger_watermarks DuckDB table (upsert on change), seeded before the first sweep at startup. Dashboard needs no watermark persistence: its deadlock gate re-baselines on restart (raw delta) or is 5-min-windowed (always within the cooldown the seed now covers), and blocking is level+cooldown -- none produce the byte-identical duplicate the Lite edge-trigger gate does. Tests: Lite 505 + Dashboard 487 green. New: webhook-row history filter + watermark save/load/upsert round-trips + WebhookAlertService seed-suppresses / seed-older-than-cooldown-does-not / null-store-attempts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…send Fix #1145: Webhook (Teams/Slack) alerts re-fire after an app restart

…surfacing it Upgrading the app appeared not to take effect: the apps are single-instance (constant mutex) and minimize to tray, so an old build kept running after the user "closed" it, and launching the new build just surfaced the old in-memory version via the mutex. Fix: a version-aware handoff at startup — a newer build closes an older tray-resident one and takes over, instead of being handed back the stale version. Shared PerformanceMonitor.Ui: - SingleInstanceDecision: pure, unit-tested decision (older->take over; same/newer->surface; older-but-higher-integrity->actionable error). - ProcessInspector: Win32 — read the other instance's release version from its on-disk exe (QueryFullProcessImageNameW, cross-integrity for same user), measure integrity level directly, detect split-token admin. Fails closed. - SingleInstanceCoordinator: acquire-or-handoff; prompt; graceful exit signal (old runs its real shutdown), bounded wait, force-kill last resort; mutex take-over; elevated relaunch (--upgrade-takeover) for the elevated-old case. - MessageBoxHandoffPrompts: shared dialogs (both apps, parity). Both apps: OnStartup runs the coordinator (replacing the inline mutex/surface block) synchronously before any window/DB/port init; OnExit disposes it; MainWindow opens the exit-for-upgrade channel only after init (so a newer build won't disturb a mid-initializing instance). Scoped by exe name so Lite never targets Dashboard and vice-versa. Local\ session scoping kept intentionally. Two adversarial plan reviews + one implementation review folded in (version field = ProductVersion not FileVersion; mutex-throw vs an elevated instance -> integrity-error path not crash; deferred exit listener; direct IL measure; runas gated to split-token admins; UAC-cancel handled; handles disposed). Tests: Lite 524 + Dashboard 487 green; 0 new warnings. Manual smoke testing of the upgrade/elevation paths still required before merge (see plans/single-instance-upgrade-handoff.md). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…-upgrade-handoff Single-instance upgrade handoff: close the stale instance instead of surfacing it

Low-risk dependency refresh (Velopack 0.x→1.2.0 and DuckDB held as separate efforts): - Microsoft.Extensions.* (Configuration, Configuration.Json, Hosting, Logging, Logging.Abstractions): 10.0.8 -> 10.0.9 (tracks the .NET 10 servicing line) - ModelContextProtocol + ModelContextProtocol.AspNetCore: 1.3.0 -> 1.4.0 - Microsoft.NET.Test.Sdk: 18.5.1 -> 18.6.0 (test projects) Lock files regenerated (--force-evaluate) for --locked-mode CI restore. Build clean (0 new warnings); Lite 524 + Dashboard 487 + Installer.Tests (fast subset) 61 green. MCP 1.4.0 compiled with no source changes needed. ScottPlot.WPF, Microsoft.Data.SqlClient, Hardcodet, CredentialManagement, xunit are already at latest. WPF/.NET stays on .NET 10 (11 is preview). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…y-bumps Bump minor/patch NuGet dependencies (Extensions 10.0.9, MCP 1.4.0, Test.Sdk 18.6.0)

Velopack 1.x is the stable line; this also corrects a latent mismatch — build.yml ran `dotnet tool install -g vpk` unpinned, so releases were already packed with vpk 1.x while the app library trailed at 0.0.1298. This aligns the reader library with the packer (now pinned to vpk 1.2.0) without changing the feed format. - Dashboard + Lite: Velopack PackageReference 0.0.1298 -> 1.2.0 - build.yml: `dotnet tool install -g vpk --version 1.2.0` (was unpinned) - Lock files regenerated (--force-evaluate); they shrink because Velopack 1.x dropped the NuGet.Versioning transitive dep (custom SemanticVersion, 1.0.1). No source changes needed — VelopackApp.Build().Run(), UpdateManager, GithubSource, CheckForUpdatesAsync/DownloadUpdatesAsync/ApplyUpdatesAndRestart all unchanged. Build clean (0 new warnings); Lite 524 + Dashboard 487 green. Live cross-release auto-update to be validated at the next release (checklist 8b). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Bump Velopack 0.0.1298 → 1.2.0 and pin the vpk CLI to match

The only build warning across the solution: UnforceFunc was read in UnforcePlanAsync but never assigned by any test, so it was always null (CS0649). Removed the field and its no-op invoke; UnforcePlanAsync returns the same default outcome as before. Solution now builds at 0 warnings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…eanup Build health: remove dead test seam (solution to 0 warnings)

The repo's .gitattributes already normalizes text (`* text=auto eol=crlf`) and the tree is already normalized (`git add --renormalize .` is a no-op). Gap: with the global eol=crlf rule, a future *.sh would be checked out CRLF and fail under Git Bash, which this repo uses heavily. Add `*.sh text eol=lf`. No tracked .sh files today, so this changes nothing now — it's a latent-footgun guard. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Harden .gitattributes: force LF for shell scripts

Both fixes bring Dashboard in line with the already-correct Lite copies; surfaced by a Lite<->Dashboard code-sharing drift audit. 1. SqlServerBaselineProvider: the full (hour, day-of-week) bucket tier was assigned via a copy-paste ternary whose two arms both returned BaselineTier.Full, so sparse buckets (count < CollapseThreshold) were mislabeled HourOnly in baseline_tier. Every bucket on this path is Full; HourOnly/Flat are assigned only on the collapse/flat paths. Matches Lite. 2. SqlServerAnomalyDetector.DetectBlockingAnomalies: blocking/deadlock spike ratios compared a raw window count against a per-hour baseline mean, so the ratio scaled with window length (default 4h) and a steady event rate could trip the spike threshold. Normalize current counts to per-hour before the ratio, mirroring Lite. Dashboard build + 487 Dashboard.Tests pass. No Lite changes (already correct). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…erprint The email/webhook cooldown was keyed on (serverId, metricName), ignoring the #1140 per-incident dedup fingerprint, so a genuinely distinct deadlock/blocking/ query/job/disk incident arriving inside the EmailCooldownMinutes window was silently dropped from email/Teams/Slack (the tray still fired). Per-event mode also collapsed to one notification per cycle because each per-incident send shared the single metric key. Introduce a shared IncidentCooldown (PerformanceMonitor.Notifications) keyed per fingerprint: send if any incident in the alert is outside its window, stamp every candidate key on success, and fall back to the metric-level key when an alert carries no fingerprintable incident (CPU/memory/poison-wait/tempdb/failed-job -- behavior unchanged). The restart seed (#981 email, #1145 webhook) is now per-fingerprint, reconstructed from the persisted ContextJson via an anchored "DedupKey" match (null-guarded for the Dashboard scan's null-context rows); the webhook null-store no-seed path is preserved. The per-fingerprint dict is bounded by the 2x-window eviction idiom reused from AnalysisNotificationService. Both apps and both channels run the identical shared decision; only the seed query differs (Lite DuckDB LIKE vs Dashboard in-memory scan), both pinned to the real serializer output by tests. Tests: Lite 544/544, Dashboard 488/488. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The per-event vs summary delivery mode shipped as a global setting (#1141). This adds the optional per-server override the original request mentioned: Per-event for one noisy prod box while the global default stays Summary. - ServerConnection (both apps) gains a nullable AlertDeliveryModeOverride; null inherits the global, persisted in the existing servers.json (no new store). - Shared AlertDeliveryModeResolver (Notifications) centralizes the precedence (override wins, null inherits) so Lite and Dashboard can't drift. - SendDetectedAlertAsync in both apps resolves the effective mode per server before splitting per-event; Lite maps its int serverId hash back to the server. - Add/Edit Server dialog (both apps) gets an "Alert delivery" combo (Use global setting / Summary / Per-event), wired through load + save. - Tests: resolver precedence + ServerConnection JSON round-trip incl. legacy-file-without-field inherits global (both suites). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…-alert-delivery-mode Per-server override for alert delivery mode (#1236)

A fresh Lite install / nightly extract created %LOCALAPPDATA%\...\config\ empty and never seeded it from the bundled config\ignored_wait_types.json. Because LoadIgnoredWaitTypes() reads only the per-user path, a clean box returned an empty set (cached by the Lazy), the wait filter became a no-op, and every benign wait (SOS_WORK_DISPATCHER, DISPATCHER_QUEUE_SEMAPHORE, CLR_AUTO_EVENT, ...) flooded collection and the wait stats tab. - App: seed the per-user config dir from the bundled copies on first run (copy-if-absent; never clobber a user-edited file), via new ConfigSeeder. - LoadIgnoredWaitTypes: fall back to the bundled copy if the per-user file is still missing, so the filter can never silently be empty; warn if neither. - Tests: ConfigSeeder copies when absent, never overwrites, no-ops on a missing bundle. Lite-only: Dashboard seeds its list server-side into config.ignored_wait_types during install, so it's unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Seeding/copying ignored_wait_types.json fixes collection going forward but can't remove rows already in the DuckDB, and the wait-stats tab had no display-time filter. So a box that collected benign waits before the filter was active (e.g. a fresh extract that ran before the per-user JSON existed) kept showing SOS_WORK_DISPATCHER, DISPATCHER_QUEUE_SEMAPHORE, CLR_AUTO_EVENT, etc. dominating the tab even after the JSON was put in place. - New IgnoredWaitTypes: one shared source for the ignored set (per-user copy then bundled fallback) plus a sanitized "AND wait_type NOT IN (...)" builder. Collection (RemoteCollectorService) and display (LocalDataService) both use it, so the two lists can't drift. - LocalDataService wait queries (top list, picker distinct types, total trend) exclude ignored waits at query time. Non-destructive: rows stay in the DuckDB and age out via retention; nothing is deleted. - Test for the exclusion-clause builder incl. injection-safety sanitization. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…red-waits Fix #1240: Lite seeds per-user ignored_wait_types.json on first run

Clicking a multi-series trend chart's built-in legend key (or its line) now dims the other series and auto-fits the Y axis to the clicked one, so a series that sits flat under the big lines becomes readable. Clicking again, switching to another series, or double-clicking (autoscale) restores the full view. It is a transient view toggle only: it never changes the picker selection and never refetches or deletes data, and any re-render (picker/time-range change or background poll) resets it. The mechanic lives entirely in the shared PerformanceMonitor.Ui ChartHoverHelper that every dynamic-legend chart in both apps already routes its series through, so Lite and Dashboard cannot drift: - capture each series' identity color from MarkerStyle.FillColor in Add - left-click handlers branch once on legend-panel containment, then run the ScottPlot 5.1.58 legend hit-test or the existing line hit-test (click-vs-drag < 5px; never sets e.Handled, so pan/zoom keep working) - Isolate/Restore dims via each series' own color (Dark/Light/CoolBreeze safe) and clears+restores Dashboard's LockedVertical axis rule so the Y-fit actually sticks (Lite installs no rule, so it is a no-op there) - a static ConditionalWeakTable<WpfPlot, ChartHoverHelper> + TryGetForChart lets the per-app autoscale handlers clear an active isolate first Per-app hooks: both the "Revert (Autoscale)" menu item and the double-click handler in Dashboard TabHelpers and Lite ContextMenuHelper call Restore() before AutoScale() (4 sites, symmetric). Pure helpers (toggle transitions, dim/restore decision, Y-fit range math incl. the degenerate-flat guard, axis-rules bookkeeping) are unit tested in both suites via ChartClickIsolateTests (20 each). Both apps build green; Dashboard.Tests 687 pass, Lite.Tests 628 pass. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Post-review fixes on the chart click-to-isolate feature, found during re-verification (the implementer's self-review and green tests had missed them): - Restore was unfaithful for line-only charts. CollectorDuration and the trend charts build line-only (MarkerSize 0, no fill) and never call StyleScatter, but Isolate/Restore re-ran StyleScatter on every series, sprouting density markers and a gradient fill ribbon they never had (until the next poll-rebuild). Add now snapshots each series' full visual state (identity color, line color/width, marker size, FillY); RestoreSeriesVisual writes it back, re-running StyleScatter ONLY for fill charts (it regenerates the gradient from the unchanged data, so it reproduces the original). Faithful for fill, line-only, and flat-StyleScatter'd series. +2 headless regression tests per suite. - Double-click no longer relies on e.ClickCount on the terminal up (uncertain WPF semantics). A MouseDoubleClick handler sets a suppress flag consumed at the TOP of OnLeftButtonUp, before the _leftPressed gate: the 2nd-down's PreviewMouseLeftButtonDown is marked Handled by Control.HandleDoubleClick so our press handler is skipped and _leftPressed is already false there -- consuming the flag later would leave it stuck and swallow the next genuine click. Both apps build green; Dashboard.Tests 689, Lite.Tests 630, 0 failed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…late Add chart click-to-isolate on legend keys and series lines

Per UX feedback: isolating a series rescaled the Y axis to that series' own high-water marks every time -- useful for a buried/flat line, but jarring when the series is already prominent. Drop the auto-fit: isolate now just dims the other series and leaves the axes alone (to inspect a buried wait, deselect the big ones in the picker, which re-renders + autoscales). Removes the now-unused Y-fit + axis-rule machinery (AutoFitYToSeries, ComputeIsolateYLimits, SaveAndClearRules/RestoreAxisRules, the _preIsolateLimits and _savedRules fields) and their unit tests. Restore now just un-dims. Both apps build green; Dashboard.Tests 677, Lite.Tests 618, 0 failed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Make click-to-isolate dim-only (no Y auto-fit)

…ifest fix - Bump Dashboard/Lite/Installer/Installer.Core to 3.1.0 (Version/AssemblyVersion/FileVersion/InformationalVersion) - CHANGELOG: roll [Unreleased] -> [3.1.0] - 2026-06-27; add full-detail entries for the shipped-but-undocumented work since 3.0.0 (block-chain viewer, deadlock graph viewer, always-on DMV blocking-snapshot fallback, incident clustering, FinOps object-growth/locking heatmaps, per-server alert delivery override, MCP status envelope, Lite ignored-waits seeding, Lite picker-chart N+1 fix) - README: collector counts 33->34 and 25->26 to match the schedule table; add the block-chain/deadlock viewers to both apps' tab lists; Recommendations "grouped by severity" -> "grouped into incidents" (#1214); note the always-on DMV blocking fallback in the AWS RDS section - upgrades/3.0.0-to-3.1.0: add the missing upgrade.txt manifest (a folder with no manifest is silently skipped by ScriptProvider, so the blocking_ecid/monitor_loop ALTER would never run on upgrade) and add the required SET-options + USE PerformanceMonitor header to the ALTER script Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ging from human prose The shipped 3.0 advice prose (FactAdvice.cs) read like folklore to a human card reader: maybe-if hedging and MCP tool names (get_*) where it should have stated the measured number. Root defect = unincorporated facts; the hedging and tool-names were symptoms. - All 56 advice blocks now COMPOSE from the fact set at analysis time: they state the collected Value/Metadata (current MAXDOP/CTFP, max server memory, RCSI-off DB count, dominant lock mode, SOS signal-wait share, lead-blocker permutations) instead of telling the reader to run a tool. Tool/field names stripped from the composed path. - Object-name bug fixed: ANOMALY_OBJECT_GROWTH/CONTENTION prose promised to name an object, but both collectors SELECTed schema/table/index then dropped them. New Fact.ObjectName carrier (Metadata stays doubles-only); both AnomalyDetectors read the dropped columns; composers state dbo.Orders / index IX_*. - Remediation regrounded: each remediation states the co-fired findings that actually fired (PLAN_REGRESSION/PARAMETER_SENSITIVITY/MISSING_INDEX/PLAN_WARNING/CXPACKET) instead of a bag of guesses; SOS "more cores?" ties to the stated signal-wait share. - Static-fallback scrub: the ~36 static _byKey blocks (render only for legacy empty-StoryText findings, which self-heal on the next analysis run) had the full bag-of-tricks + tool names; surgically scrubbed, legit DMV/sp_/perfmon refs kept. LCK_RANGE now routes via ComposeRangeLock. Dashboard.Tests 574 / Lite.Tests 547 green (+compose tests incl. a real LCK_M_RS_S regression guard). Audience-dehedge rebuild; a larger prose rewrite is planned separately. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…dience-dehedge Advice engine: compose all 56 blocks from facts; strip tool-names/hedging from human prose

…conciliation My first changelog pass walked only the most recent commits and missed a large middle band of post-3.0.0 work. Reconciled the full merge list (96 PRs since v3.0.0) against the changelog and added every previously-missing user-facing entry: - Advice engine: the sourced/fact-composed advice rebuild (#1244) + the correctness cluster (#1185 PLE, #1187 MAXDOP topology code-fix, #1192, #1194 five wrong claims, #1196-#1198/#1203 composer value-stating) - In-app plan navigation on every query surface (#1184) - Dashboard Queries/tab-load responsiveness (#1181/#1182/#1190) - Active Queries refresh-on-view (#1183); themed resolved/cleared toasts (#1186) - Desktop single-instance upgrade handoff (#1148) - Lite interactive UI-thread offload (#1193/#1202); View Plan no-op fixes (#1181/#1190) - FinOps Database Sizes init-order race (#1179) - Failed-job alert dedup + restart-replay (#1157/#1173) - Dashboard anomaly/baseline drift vs Lite (#1155) [3.1.0] is now 8 Added / 11 Changed / 17 Fixed; all 46 issue/PR refs link-resolve. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The standalone InstallerGui project was retired in 2.9.0 and its directory deleted from the repo; two stragglers still implied it exists. Drop the "GUI Installer" item from the PR-template component checklist, and reword 99_installer_troubleshooting.sql's header (it is a 99_ script, excluded from install, so no functional change). CHANGELOG mentions are historical records of the retirement and are left intact. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…grid The empty-state lane (most often Blocking/Deadlocking on a healthy server) rendered as a dead black box: HideGrid() plus an EmptyTickGenerator on both axes, and a "No Data" label pinned at (0,0) that SyncXAxes immediately shoved off-screen when it overrode the X limits to the real time range. Replace that with a live, gridded lane that matches the populated lanes: keep the grid, set a 0-1 Y axis with normal numeric ticks, and use DateTimeTicksBottomDateChange so the vertical gridlines align with the other lanes (time labels still only on the bottom File I/O lane). Drop the never-visible "No Data" text. ShowEmpty becomes an instance method in both apps so it can reference FileIoChart. Mirrored in Lite and Dashboard (sync-paired control). Both build clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…mpty-grid Overview lanes: render empty Blocking/Deadlocking lane as a live 0-1 grid

… 3.1.0 date 2026-06-28 #1245 landed on dev after the initial 3.1.0 changelog reconciliation, so it was missing from the release notes. The empty-state Blocking/Deadlocking Overview lane now renders as a live 0-1 grid matching the populated lanes instead of a dead black box (both apps). Added as a [3.1.0] Fixed entry with its reference link, and bumped the [3.1.0] date to today (finalized at the actual cut). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

….0 prep guard) The embedded ScriptProvider (the path the Dashboard uses, distinct from the CLI's filesystem provider) had zero test coverage for an actual release upgrade. Two new tests, both reading the real resources compiled into Installer.Core.dll: - EmbeddedUpgrades_3_0_0_To_3_1_0_DiscoverableWithManifestAndScript pins this release's 3.0.0->3.1.0 hop through GetApplicableUpgrades (the method #772 broke): folder discovered, not skipped for a missing upgrade.txt, manifest lists the script, script carries the USE PerformanceMonitor header + the real ALTER...blocking_ecid/monitor_loop columns. Guards the prep blocker fixed in d2feb63's branch. - EmbeddedUpgrades_AllDiscoveredFoldersHaveReadableManifestAndScripts is a self-maintaining guard: every embedded upgrade folder must expose a readable manifest whose listed scripts all exist and are non-empty. Full Installer.Tests suite: 82 passed, 0 failed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The app tracked collection HEALTH (did data arrive?) but never collection STATE, so disabling the SQL Agent collector jobs was silent: the Dashboard kept looking healthy (actually calmer, since the live cards read zero rows from collect.* tables) until a collector aged into STALE on the Collection Health tab after 24 hours. Add an app-side check that survives the collector being off, because it's the collector that fills every other table: - Live msdb read of msdb.dbo.sysjobs.enabled for PerformanceMonitor% jobs (immediate, specific cause), gated on Azure SQL DB and degrading gracefully on restricted msdb (RDS / no SQLAgentReaderRole) -- never reports "disabled" when it simply could not look. - A config.collection_log freshness backstop (no run in 30+ min) that also catches the Agent service being stopped or collectors silently erroring. Surfaced as a proactive "Collection Stopped" tray/email alert (new NotifyOnCollectionStopped pref, default on, mirroring the Capture Down pattern: cooldown, mute, and a "Collection Resumed" clear) plus a banner on the Collection Health tab so it shows immediately, not only after the 24h STALE lag. Decision logic extracted to DatabaseService.DecideCollectionStopped and unit tested (9 cases). Dashboard-only -- Lite has no Agent jobs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ped-detection Add "Collection Stopped" alert: warn when collector Agent jobs are disabled

#1246 (merged to dev as accd03d) adds a Full Dashboard "Collection Stopped" alert: an app-side check (live msdb.dbo.sysjobs.enabled for PerformanceMonitor% jobs + a config.collection_log 30-min freshness backstop) that survives the collector being off, surfaced as a tray/email alert (new NotifyOnCollectionStopped pref, default on, with a "Collection Resumed" clear) plus a Collection Health tab banner. Added as a [3.1.0] Added entry + ref link, and a new row in the README Alert Types table. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…n real engine edition Follow-up to #1246. GetCollectionStatusAsync (the Collection Health tab entry) passed engineEdition 0, so on Azure SQL DB it issued a doomed msdb.dbo.sysjobs query and relied on the catch to degrade -- diverging from the alert path and the CPU/failed-job checks, which all skip cleanly on EngineEdition 5. It now resolves the real edition via SERVERPROPERTY('EngineEdition') (the same idiom ServerManager and FinOps.Inventory use) and passes it through, so the tab gates Azure the same clean way. The msdb try/catch stays as a backstop, and a failed edition read returns 0 (the inner check still runs), so it never disables the check. No functional change for supported editions -- the Full Dashboard already rejects EngineEdition 5 at connection -- so this is a consistency/defense-in-depth fix that removes the hardcoded-0 smell and keeps the tab in step with the alert path. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ped-edition-gate Dashboard: gate Collection Health tab collector-stopped check on real engine edition

…E for dedup collectors) Dedup-snapshot collectors (server_properties + the config snapshots) log SKIPPED when nothing changed -- a successful no-op. But the per-collector health computed last_success_time from SUCCESS only, so a collector that's correctly skipping showed STALE, then NEVER_RUN once its last real SUCCESS aged out of log retention. This is the same "SKIPPED is fine" semantics #1246's freshness backstop already uses. Dashboard (report.collection_health view, install/47): SKIPPED now counts toward last_success_time and total_runs, and is included in the recent_failures window so a skip-only collector doesn't fall through to the consecutive_failures FAILING branch. Validated live: server_properties on SQL2016/2017/2025 flips STALE/NEVER_RUN -> HEALTHY. Lite (LocalDataService.CollectionHealth): SKIPPED counts toward last_success_time too, so a version-gated/dedup collector not on the OnLoadCollectors exemption list no longer false-STALEs. Build clean, 618 Lite.Tests pass. Parity fix, both apps. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…th-skipped-status Collection Health: count SKIPPED as a healthy run (no false STALE for dedup collectors)

#1248 (merged to dev) makes per-collector Collection Health count SKIPPED as a healthy run, so dedup / skip-if-unchanged collectors (server_properties + the config snapshots) stop showing false STALE/NEVER_RUN in both apps. Added as a [3.1.0] Fixed entry + reference link. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Release v3.1.0 prep -> dev

erikdarlingdata and others added 30 commits June 17, 2026 05:50

Merge pull request #1137 from erikdarlingdata/feature/1135-index-obje…

ba56cfd

…ct-stats Fix #1135: Lite index_object_stats collector times out as all-or-nothing sweep

Merge pull request #1139 from erikdarlingdata/feature/1136-volume-fre…

979aca2

…e-space-severity Grade Volume Free Space alert severity (WARNING/CRITICAL) (#1136)

Merge pull request #1142 from erikdarlingdata/feature/1140-alert-dedu…

932c8c9

…p-fingerprint #1140/#1141: stable dedup fingerprint + involved-objects on alert payloads

Merge pull request #1143 from erikdarlingdata/feature/1140-finding-pa…

d2c0066

…th-incidents #1140: dedup fingerprints on the anomaly/finding alert path (both apps)

Merge pull request #1144 from erikdarlingdata/feature/1141-per-event-…

6244947

…notifications #1141: per-event notification mode for deadlock/blocking alerts (global)

Merge pull request #1146 from erikdarlingdata/feature/1141-per-event-…

fad047d

…detail #1141: per-event cards keep forensic detail + numeric Current Value (gotqn feedback)

Merge pull request #1147 from erikdarlingdata/feature/1145-restart-re…

8ae9dbf

…send Fix #1145: Webhook (Teams/Slack) alerts re-fire after an app restart

Merge pull request #1148 from erikdarlingdata/feature/single-instance…

6ad367c

…-upgrade-handoff Single-instance upgrade handoff: close the stale instance instead of surfacing it

Merge pull request #1149 from erikdarlingdata/feature/minor-dependenc…

e040dda

…y-bumps Bump minor/patch NuGet dependencies (Extensions 10.0.9, MCP 1.4.0, Test.Sdk 18.6.0)

Merge pull request #1150 from erikdarlingdata/feature/velopack-1.2.0

4d9d2c7

Bump Velopack 0.0.1298 → 1.2.0 and pin the vpk CLI to match

Merge pull request #1151 from erikdarlingdata/feature/build-health-cl…

7feb54c

…eanup Build health: remove dead test seam (solution to 0 warnings)

Merge pull request #1152 from erikdarlingdata/feature/gitattributes-eol

f34d96c

Harden .gitattributes: force LF for shell scripts

erikdarlingdata and others added 29 commits June 26, 2026 14:31

Merge pull request #1239 from erikdarlingdata/feature/1236-per-server…

fbe9780

…-alert-delivery-mode Per-server override for alert delivery mode (#1236)

Merge pull request #1241 from erikdarlingdata/fix/1240-lite-seed-igno…

7cc265f

…red-waits Fix #1240: Lite seeds per-user ignored_wait_types.json on first run

Merge pull request #1242 from erikdarlingdata/feature/chart-click-iso…

0e9ea71

…late Add chart click-to-isolate on legend keys and series lines

Merge pull request #1243 from erikdarlingdata/feature/isolate-dim-only

566f72a

Make click-to-isolate dim-only (no Y auto-fit)

Merge pull request #1244 from erikdarlingdata/feature/advice-prose-au…

a0efdb7

…dience-dehedge Advice engine: compose all 56 blocks from facts; strip tool-names/hedging from human prose

Merge remote-tracking branch 'origin/dev' into release/v3.1.0

6383467

Merge pull request #1245 from erikdarlingdata/feature/blocking-lane-e…

0edf673

…mpty-grid Overview lanes: render empty Blocking/Deadlocking lane as a live 0-1 grid

Merge pull request #1246 from erikdarlingdata/feature/collection-stop…

accd03d

…ped-detection Add "Collection Stopped" alert: warn when collector Agent jobs are disabled

Merge pull request #1247 from erikdarlingdata/feature/collection-stop…

8bd48de

…ped-edition-gate Dashboard: gate Collection Health tab collector-stopped check on real engine edition

Merge pull request #1248 from erikdarlingdata/feature/collection-heal…

8692f7e

…th-skipped-status Collection Health: count SKIPPED as a healthy run (no false STALE for dedup collectors)

Merge pull request #1249 from erikdarlingdata/release/v3.1.0

71c63ea

Release v3.1.0 prep -> dev

erikdarlingdata merged commit d1e3eed into main Jun 29, 2026
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v3.1.0#1250

Release v3.1.0#1250
erikdarlingdata merged 250 commits into
mainfrom
dev

erikdarlingdata commented Jun 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

erikdarlingdata commented Jun 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant