Skip to content

fix(e2e): stabilize local Docker smoke test#1935

Open
elezar wants to merge 2 commits into
mainfrom
fix-local-e2e
Open

fix(e2e): stabilize local Docker smoke test#1935
elezar wants to merge 2 commits into
mainfrom
fix-local-e2e

Conversation

@elezar

@elezar elezar commented Jun 16, 2026

Copy link
Copy Markdown
Member

Summary

Fix local Docker e2e by honoring the configured supervisor image and isolating host OpenSSH from Nix/devenv linker variables while preserving the proxy child environment.

Related Issue

N/A

Changes

  • Honor the configured Docker supervisor image when running local Docker e2e.
  • Scrub host linker environment variables before launching external ssh.
  • Restore the original linker environment for the openshell ssh-proxy child command.
  • Add focused CLI tests for the split SSH/proxy environment behavior.

Testing

  • mise run pre-commit passes
  • Unit tests added/updated
  • E2E tests added/updated (if applicable)

Additional validation run:

  • /home/elezar/.cargo/bin/cargo fmt --check
  • /home/elezar/.cargo/bin/cargo test -p openshell-cli linker_environment
  • /home/elezar/.cargo/bin/cargo clippy -p openshell-cli --all-targets -- -D warnings
  • mise run e2e:docker

mise run pre-commit was attempted locally but did not complete: helm lint reported missing chart dependency metadata for postgresql, and three openshell-supervisor-process tests failed because this shell could not resolve the current user/group. The clippy failure from this change was fixed and rechecked.

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)

@elezar elezar requested review from a team, derekwaynecarr and mrunalp as code owners June 16, 2026 19:15
@github-actions

Copy link
Copy Markdown

@elezar elezar added test:e2e Requires end-to-end coverage test:e2e-gpu Requires GPU end-to-end coverage labels Jun 16, 2026
@github-actions

Copy link
Copy Markdown

Label test:e2e applied for 5b97b26. Open the existing run and click Re-run all jobs to execute with the label set. The run will execute the standard E2E suite after building the required gateway and supervisor images once. The matching required CI gate status on this PR will flip green automatically once the run finishes.

@github-actions

Copy link
Copy Markdown

Label test:e2e-gpu applied for 5b97b26. Open the existing run and click Re-run all jobs to execute with the label set. The run will execute GPU E2E after building the required supervisor image once. The matching required CI gate status on this PR will flip green automatically once the run finishes.

@elezar

elezar commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

gator-agent

PR Review Status

Validation: this is maintainer-authored, project-valid Docker e2e and SSH environment stabilization work. It keeps local Docker e2e aligned with the configured supervisor image and prevents host linker variables from breaking the external OpenSSH process while preserving the proxy child environment.
Head SHA: 5b97b26a7af0618d2b86569e01a9a94fcccd6406

Review findings:

  • No blocking findings remain.
  • Nonblocking suggestion from the independent review: the internal DockerComputeConfig.supervisor_image comment is stale because configured supervisor_image now takes precedence over sibling/local builds when supervisor_bin is unset.

Docs: relevant Docker supervisor image behavior is reflected in the Docker driver README and docs/reference/gateway-config.mdx; no docs navigation change appears necessary.
E2E: test:e2e and test:e2e-gpu are already applied because this affects Docker e2e/supervisor behavior.
Checks: OpenShell / Branch Checks, OpenShell / Helm Lint, and DCO are passing. OpenShell / E2E and OpenShell / GPU E2E are pending for this head.

Next state: gator:watch-pipeline

@elezar elezar added the gator:watch-pipeline Gator is monitoring PR CI/CD status label Jun 16, 2026
@elezar

elezar commented Jun 16, 2026

Copy link
Copy Markdown
Member Author

gator-agent

Re-check After CI Update

I re-evaluated latest head 5b97b26a7af0618d2b86569e01a9a94fcccd6406 after the required E2E gates completed.

Disposition: not resolved.

Remaining items:

  • Still blocking: OpenShell / E2E failed. The e2e / E2E (rust-docker) and e2e / E2E (rust-podman-rootless) jobs both failed tests/forward_proxy_graphql_l7.rs::graphql_l7_enforces_allow_and_deny_rules_on_forward_and_connect_paths.
  • Docker observed forward_query_allowed = 403 where the test expected 200.
  • Rootless Podman observed forward_get_query_allowed = 403 where the test expected 200.
  • OpenShell / Branch Checks, OpenShell / Helm Lint, DCO, and OpenShell / GPU E2E are green for this head.

Next state: gator:in-review

@elezar elezar added gator:in-review Gator is reviewing or awaiting PR review feedback and removed gator:watch-pipeline Gator is monitoring PR CI/CD status labels Jun 16, 2026
@elezar elezar requested a review from maxamillion as a code owner June 17, 2026 07:05
@elezar elezar enabled auto-merge (squash) June 17, 2026 12:19
elezar added 2 commits June 17, 2026 14:21
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Comment on lines +2964 to +2969
// Tier 2: explicit supervisor_image in [openshell.drivers.docker].
// A configured image should be the source of truth even when a local
// developer build is present under target/.
if let Some(image) = docker_config.supervisor_image.clone() {
return extract_supervisor_bin_from_image(docker, &image).await;
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a comment here:

/// Optional override for the image the gateway pulls to extract the
/// Linux `openshell-sandbox` binary when no explicit binary path or
/// local build is available. Defaults to
/// `ghcr.io/nvidia/openshell/supervisor:<gateway-image-tag>`.
pub supervisor_image: Option<String>,
that should be updated after changing order in which the supervisor bin is resolved.

Could be something like:

/// Optional image used to extract the Linux `openshell-sandbox` binary.
/// Ignored when `supervisor_bin` is set. See `resolve_supervisor_bin` for
/// the full resolution order.
pub supervisor_image: Option<String>,

It could make sense to validate the config and only allow either supervisor_bin or supervisor_image, so both cannot be set. But that would be a breaking change.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it would be good to have a test for this new resolution order.

}

#[test]
#[allow(unsafe_code)] // Test-only: env vars require unsafe in Rust 2024.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use temp-env here (cli crate already has a dep on it) to remove the unsafe and resetting the env var.

I think it could also remove the need for the lock, based on their docs.

Avoid interference when running concurrently

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gator:in-review Gator is reviewing or awaiting PR review feedback test:e2e Requires end-to-end coverage test:e2e-gpu Requires GPU end-to-end coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants