Skip to content

feat(supervisor): enforce supervisor session token expiry with proactive reconnect #1954

@pimlock

Description

@pimlock

Problem Statement

ConnectSupervisor is authenticated when the bidirectional stream is created. The supervisor process already refreshes its gateway sandbox JWT in the background, and new outbound RPCs pick up the refreshed token through the shared AuthInterceptor token slot. An already-open ConnectSupervisor stream does not get re-authenticated when that token slot changes.

For the initial session-control migration in #1731, this is acceptable: the accepted stream remains valid until disconnect, supersede, sandbox deletion, or gateway restart. This follow-up tracks making session lifetime explicitly align with token lifetime.

Proposed Design

Add expiry-enforced supervisor sessions with proactive reconnect:

  • Record the accepted token/session expiry when the gateway accepts ConnectSupervisor.
  • Extend SessionAccepted with both:
    • session_expires_at_unix_ms
    • reconnect_before_unix_ms
  • Have the supervisor open a fresh ConnectSupervisor before reconnect_before_unix_ms, using the already-refreshed process-wide token.
  • Let the gateway supersede the old session through the existing SupervisorSessionRegistry reconnect behavior.
  • Keep bearer tokens out of supervisor session payloads.
  • Add a short gateway grace period so clock skew or scheduler delay does not cause avoidable sandbox disconnects.

Alternatives Considered

Agent Investigation

Relevant current behavior:

  • crates/openshell-core/src/grpc_client.rs stores the bearer token in a process-wide slot.
  • refresh_token_loop renews the gateway sandbox JWT around 80 percent of remaining lifetime.
  • AuthInterceptor injects the current token into new outbound gRPC requests.
  • crates/openshell-server/src/supervisor_session.rs validates the ConnectSupervisor request once at stream creation and does not re-check token expiry in the session loop.

Definition of Done

  • SessionAccepted carries explicit expiry and reconnect deadline fields.
  • Gateway computes and records session expiry for accepted supervisor sessions.
  • Supervisor reconnects before the deadline using the refreshed token slot.
  • Gateway supersedes the old session without interrupting normal relay/control behavior.
  • Expired sessions are closed or rejected after a documented grace period.
  • Tests cover reconnect-before-expiry, expired-session handling, and supersede behavior.
  • Architecture docs describe the session auth lifetime.

Related

Metadata

Metadata

Assignees

Labels

area:gatewayGateway server and control-plane workarea:supervisorProxy and routing-path work

Type

No type
No fields configured for issues without a type.

Projects

Status
Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions