fix(CS-11647): hide manual approval UI for auto-executed commands#5273
fix(CS-11647): hide manual approval UI for auto-executed commands#5273FadhlanR wants to merge 8 commits into
Conversation
Preview deploymentsHost Test Results 1 files 1 suites 1h 58m 43s ⏱️ Results for commit 1bd7797. For more details on these errors, see this check. Realm Server Test Results 1 files ± 0 1 suites ±0 12m 2s ⏱️ -3s Results for commit 1bd7797. ± Comparison against earlier commit ad2e63d. |
The manual Accept All / Cancel bar briefly flashed below an assistant tool call before command-service started auto-executing it. The drain loop decides "auto-execute" inside an async 100ms-debounced task, but the room's readyCommands getter only filtered by execution status — so during the debounce window the bar painted, then yanked itself when acceptingAllRoomIds flipped. Pull the auto-execute decision into a synchronous predicate (isAutoExecutableCommand) and call it from both command-service (to decide whether to run) and room.gts readyCommands (to decide whether to render the bar). Single source of truth for the three conditions — checkCorrectness, requiresApproval=false, LLM mode act. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
19f74e0 to
07dc010
Compare
isAutoExecutableCommand has three branches (checkCorrectness, requiresApproval=false, LLM mode 'act') and the readyCommands filter treats them uniformly. Rename the existing integration test so it reads as "any always-auto-executed command" instead of a checkCorrectness-specific case, and add a second test that drives the requiresApproval=false branch via the boxel-environment skill. The LLM-mode='act' branch is still locked in by the unit test — exercising it end-to-end would need extra LLM-mode-state plumbing for diminishing returns since all three branches converge on the same filter line in readyCommands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The per-command Apply button (rendered next to each tool-call message) flashes through its 'ready' state — a clickable Run button — for the ~100ms debounce window before command-service starts the auto-execute run. Same root cause as the Accept All bar. Reuse isAutoExecutableCommand: when status is ready/undefined and the helper says this command will auto-execute, render the applying state (spinner) immediately. The button then transitions naturally to applied on completion. If validate fails in the drain, command-service emits an invalid commandResult and the button settles into its invalid state — no risk of the spinner sticking. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI surfaced a regression on the per-command Apply button fix: Acceptance | Commands tests: ShowCard command added from a skill, is not automatically executed when agentId does not match A ShowCard command sent by another agent has requiresApproval=false on the skill, so isAutoExecutableCommand returned true and the button masked into the applying spinner — but command-service refuses to run it because of the agentId gate (drainCommandProcessingQueue line 354-357). The button would have stuck on the spinner forever. Thread isOwnedByCurrentAgent through the predicate and short-circuit to false when it's false. Callers compute it from `message.agentId === matrixService.agentId`. command-service passes `true` because its outer loop already short-circuits on the same condition before reaching the predicate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The new isAutoExecutableCommand agent-ownership check fails closed when message.agentId doesn't equal matrixService.agentId. The new CS-11647 integration tests went through simulateRemoteMessage without setting `data.context.agentId`, so the helper short-circuited to false in the test environment and the Accept All bar / per-command ready button rendered for an unrelated reason. CI surfaced this on the two tests that depended on the helper actually returning true: not ok 121 ... Accept All bar does not flash for a requiresApproval=false command not ok 122 ... per-command Apply button does not flash Run before auto-execute starts Pass the current agent's id explicitly. The checkCorrectness Accept All test also gets the context (previously passing by accident because validate marks status=invalid for an ad-hoc checkCorrectness call, which the readyCommands filter already excludes). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ng is stuck drainCommandProcessingQueue used to `continue` past a wedged room without emitting any commandResult, so the synthetic "applying" spinner in room-message-command.gts had no terminal event to fall through and hung forever. Dispatch an `invalid` commandResult for each auto-executable command on that path so the UI surfaces the invalidCommandState alert with a Try Anyway button. Also extract STUCK_PROCESSING_TIMEOUT_MS so tests can target the same threshold the prod path uses. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ing state While the synthetic "applying" spinner is up (auto-executable command between landing and drainCommandProcessingQueue starting the run task), the card's idle test attribute used to disagree with the apply button: the apply button reported applying, the card still reported idle. Drive both off applyButtonState so the two attributes always agree about whether the spinner is visible. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nd drop ticket prefix - New unit-flavoured integration test exercising invalidateAutoExecutableCommandsForStuckProcessing directly: spies on matrixService.sendCommandResultEvent and asserts only the auto- executable command gets an invalid dispatch with the expected failureReason. Avoids stubbing the ember-resources proxy, which silently no-ops Object.defineProperty. - Add a coherence assertion that data-test-command-card-idle is omitted while the synthetic applying state is on (Glimmer drops attributes bound to falsy expressions, so the test selector is presence-based). - Drop the per-test CS-11647 prefix from titles now that the cluster is large enough to read on its own. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes a UI flash where manual approval controls briefly appear for commands that the host will auto-execute, by centralizing the auto-execute decision into a shared predicate and using it consistently across the command drain loop and both UI surfaces.
Changes:
- Introduces
isAutoExecutableCommand()(and re-homesCHECK_CORRECTNESS_COMMAND_NAME) as a single source of truth for auto-execution eligibility. - Updates command draining + room UI to hide/mask manual approval UI immediately for auto-executable commands.
- Adds unit + integration tests covering the three auto-exec branches plus the agent-ownership gate.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/host/app/lib/command-auto-execute.ts | Adds shared auto-execution predicate and constant to prevent UI/logic drift. |
| packages/host/app/services/command-service.ts | Uses shared predicate for auto-execution; adds stuck-processing invalidation for auto-exec commands. |
| packages/host/app/components/matrix/room.gts | Filters out auto-executable commands from readyCommands to prevent Accept All bar flashing. |
| packages/host/app/components/matrix/room-message-command.gts | Masks per-command Apply button into immediate “applying” state for auto-executable commands. |
| packages/host/tests/unit/lib/command-auto-execute-test.ts | Unit coverage for predicate branches + agent-ownership short-circuit. |
| packages/host/tests/integration/components/ai-assistant-panel/commands-test.gts | Integration coverage ensuring approval UI doesn’t flash for auto-exec paths and still renders for manual approval. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let invokedToolFromEventId = | ||
| this.getCurrentEventIdForCommandRequest( | ||
| roomId, | ||
| messageCommand.commandRequest.id, | ||
| ) ?? messageCommand.eventId; | ||
| await this.matrixService.sendCommandResultEvent({ | ||
| roomId, | ||
| invokedToolFromEventId, | ||
| toolCallId: messageCommand.commandRequest.id!, | ||
| status: 'invalid', | ||
| failureReason: `Room processing did not finish within ${Math.round( | ||
| STUCK_PROCESSING_TIMEOUT_MS / 1000, | ||
| )}s; command was not started`, | ||
| context: await this.operatorModeStateService.getSummaryForAIBot(), | ||
| }); |
| // How long drainCommandProcessingQueue waits for a room resource that's | ||
| // still processing before giving up on the event. In tests we shorten this | ||
| // so the stuck-timeout invalidation path can be exercised in a single test | ||
| // without holding a real test open for a minute. | ||
| const STUCK_PROCESSING_TIMEOUT_MS = isTesting() ? 1000 : 60_000; |
Background and Goal
CS-11647: when the AI assistant sent a tool call the host was going to auto-execute, the manual approval UI flashed for ~100ms before auto-execution kicked in. Two surfaces show this UI and both have the same race:
Run/ actionVerb button) rendered next to each tool-call message.The bug isn't
checkCorrectness-specific — it applies to every auto-execute branch:checkCorrectness,requiresApproval === false, or LLM mode'act'. All three flow through the same debounced decision incommand-service.drainCommandProcessingQueue.Where to start
packages/host/app/lib/command-auto-execute.ts(new) —isAutoExecutableCommand(command, activeLLMMode, isOwnedByCurrentAgent), a pure predicate that returnstrueonly when the host will actually auto-execute. Also re-homesCHECK_CORRECTNESS_COMMAND_NAME.packages/host/app/services/command-service.ts—drainCommandProcessingQueuecalls the helper instead of inlining the three-conditionif-chain.packages/host/app/components/matrix/room.gts—readyCommandsfilters out auto-executable commands so the Accept All bar can't paint in its manual-approval branch.packages/host/app/components/matrix/room-message-command.gts—applyButtonStatemasks the per-command button into theapplyingspinner immediately for auto-executable commands, so theRunbutton never flashes.Key decisions and non-obvious mechanics
message.agentId !== matrixService.agentId). The predicate has to know this — otherwise arequiresApproval=falsecommand from another agent would mask into the spinner forever (caught the first time by an acceptance test, fixed in the latest commit).validate()failures are safe. If command-service's drain decides not to run a command (validation failed), it dispatches aninvalidcommandResult. The per-command button transitions to itsinvalidstate via the result event — no risk of the spinner sticking.Working…(existingpreparingstate) → spinner immediately → result. NoRunbutton, no Accept All / Cancel bar.Tests
tests/unit/lib/command-auto-execute-test.ts) — covers all three auto-exec branches plus the agent-ownership short-circuit.tests/integration/components/ai-assistant-panel/commands-test.gts):checkCorrectness(always-auto-execute branch) → Accept All bar suppressed; asserted immediately and aftersettled().requiresApproval=falsevia the boxel-environment skill → Accept All bar suppressed.patchCardInstance(requires approval, control) → Accept All bar still renders.checkCorrectness→ per-command button showsapplying, notready.The LLM-mode-
'act'branch is covered by the unit test; doing it end-to-end would need extra mode-state plumbing for diminishing returns since all three branches converge on the same predicate.