Skip to content

feat(ci): introduce merge queue for main #1946

@elezar

Description

@elezar

Problem Statement

main can break when a pull request passes CI on a stale branch head, then GitHub creates a final merge commit against a newer main that was never tested by the PR checks.

We saw this with PR #1870 and PR #1577. PR #1870 passed Branch Checks on stale head 29d57cc, whose merge-base did not include #1577. The final merge commit ff028ce0 combined #1870 TLS reload shutdown handling with #1577 compute watcher shutdown handling, introduced a duplicate shutdown_tx binding, and caused main Rust lint to fail in https://github.com/NVIDIA/OpenShell/actions/runs/27656754843/job/81792472668.

PR #1945 fixes that immediate break, but the integration gap remains.

Proposed Design

Enable GitHub merge queue for the protected main branch and require queued merge groups to pass the same gates required for normal PRs.

Implementation outline:

  • Enable Require merge queue for the main branch protection/ruleset.
  • Add the merge_group trigger to workflows that publish required PR gate inputs, especially:
    • .github/workflows/branch-checks.yml
    • .github/workflows/branch-e2e.yml
    • .github/workflows/helm-lint.yml
  • Confirm .github/workflows/required-ci-gates.yml can evaluate and publish required gate statuses for merge-group runs, or update it so the required contexts are reported for merge queue validation.
  • Keep the required contexts aligned with the existing PR gate contexts:
    • OpenShell / Branch Checks
    • OpenShell / E2E
    • OpenShell / GPU E2E
    • OpenShell / Helm Lint
  • Document the expected maintainer workflow for adding a PR to the merge queue instead of merging directly.

GitHub documentation notes that merge queues validate PR changes applied to the latest target branch and any earlier queued changes, and that GitHub Actions workflows used as required checks must include the merge_group event: https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/managing-a-merge-queue

Alternatives Considered

Require PR branches to be up to date before merging.

That would likely have caught the #1870/#1577 interaction too, because #1870 would have had to rerun checks after updating to include #1577. However, it pushes more manual branch-update work onto contributors and maintainers. Merge queue is a better fit for a busy main branch because it validates the final integration state without forcing every PR author to repeatedly rebase or merge main by hand.

Rely only on push CI after merge.

This detects breakage after main is already broken, which is what happened here.

Agent Investigation

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:buildRelated to CI/CD and builds

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions