Description
Make sandbox readiness semantics uniform across compute drivers. Drivers should report backend/runtime state, and the gateway should compose that with supervisor-session state to decide the public SandboxPhase.
Context
The current behavior is inconsistent across drivers:
- Docker uses an in-process
SupervisorReadiness callback to avoid reporting Ready=True until the gateway has a registered supervisor session.
- VM deliberately lets the gateway promote a sandbox to
Ready when the supervisor session connects.
- Podman can report
Ready=True when the container is running, without checking gateway supervisor-session state.
- Kubernetes forwards Agent Sandbox CRD conditions, so the gateway trusts the controller-reported
Ready condition.
The gateway also has generic supervisor-session promotion and demotion. However, a later driver snapshot with Ready=True can promote a sandbox back to public Ready even if no supervisor session is registered. This makes public SandboxPhase::Ready mean either backend-ready or supervisor-connected depending on the driver path.
Proposed Design
Define the driver contract around backend readiness only:
- Drivers report whether the backend resource exists, is starting, is backend-ready, is deleting, or has hit a terminal failure.
- The gateway owns public sandbox readiness.
- Public
Ready requires both backend readiness and a registered supervisor session.
- Backend terminal failure still maps to public
Error.
- Backend deleting still maps to public
Deleting.
- Backend-ready without a supervisor session remains public
Provisioning with a clear supervisor-not-connected condition.
open_relay should keep its existing wait as race protection for reconnects and short readiness gaps.
Once the gateway composition is uniform, remove Docker-specific access to the gateway supervisor registry instead of spreading that pattern to other drivers.
HA Consideration
Supervisor sessions are currently process-local. The implementation should explicitly decide how readiness composition behaves in multi-gateway deployments. At minimum, driver snapshots from a gateway that does not own the live supervisor session must not incorrectly demote or re-promote public readiness. A more complete solution may require a persisted or leased supervisor-presence record.
Definition of Done
Description
Make sandbox readiness semantics uniform across compute drivers. Drivers should report backend/runtime state, and the gateway should compose that with supervisor-session state to decide the public
SandboxPhase.Context
The current behavior is inconsistent across drivers:
SupervisorReadinesscallback to avoid reportingReady=Trueuntil the gateway has a registered supervisor session.Readywhen the supervisor session connects.Ready=Truewhen the container is running, without checking gateway supervisor-session state.Readycondition.The gateway also has generic supervisor-session promotion and demotion. However, a later driver snapshot with
Ready=Truecan promote a sandbox back to publicReadyeven if no supervisor session is registered. This makes publicSandboxPhase::Readymean either backend-ready or supervisor-connected depending on the driver path.Proposed Design
Define the driver contract around backend readiness only:
Readyrequires both backend readiness and a registered supervisor session.Error.Deleting.Provisioningwith a clear supervisor-not-connected condition.open_relayshould keep its existing wait as race protection for reconnects and short readiness gaps.Once the gateway composition is uniform, remove Docker-specific access to the gateway supervisor registry instead of spreading that pattern to other drivers.
HA Consideration
Supervisor sessions are currently process-local. The implementation should explicitly decide how readiness composition behaves in multi-gateway deployments. At minimum, driver snapshots from a gateway that does not own the live supervisor session must not incorrectly demote or re-promote public readiness. A more complete solution may require a persisted or leased supervisor-presence record.
Definition of Done
SandboxPhase::Readyconsistently means the sandbox is usable through the gateway.SupervisorReadinessonce readiness no longer depends on driver access to gateway-local session state.