Skip to content

ci: add reusable container scan workflow (Trivy)#14

Open
ulziibay-kernel wants to merge 7 commits into
mainfrom
ci/container-scan
Open

ci: add reusable container scan workflow (Trivy)#14
ulziibay-kernel wants to merge 7 commits into
mainfrom
ci/container-scan

Conversation

@ulziibay-kernel

Copy link
Copy Markdown
Contributor

Summary

  • Adds container-scan.yml reusable workflow for Trivy-based container image scanning
  • Scans OS packages + application dependencies in built images (the layer Semgrep and Socket SCA don't cover)
  • Uploads SARIF to GitHub Security tab for unified findings view
  • Supports ECR images via OIDC role assumption

Context

Satisfies SOC 2 VPM-2 (quarterly vulnerability scans on external-facing systems) with every-deploy cadence. This is item 13 in the AWS evidence assessment.

Uses Trivy directly because Socket Basics' pre-built GitHub Action image temporarily ships without Trivy (v2.0.3). Workflow includes a comment documenting the swap path when Socket re-enables it.

Consumer usage

container-scan:
  needs: build-and-push-image
  uses: kernel/security-workflows/.github/workflows/container-scan.yml@main
  with:
    image-ref: '613957054632.dkr.ecr.us-east-1.amazonaws.com/kernel/api:sha-${{ needs.build-and-push-image.outputs.short-sha }}'
    ecr-role-to-assume: ${{ vars.ECR_PUSHER }}
    exit-code: '0'  # observe-only initially
  secrets: inherit

Test plan

  • Verify workflow syntax is valid (Actions tab)
  • Test from kernel/kernel api-build workflow pointing at this branch
  • Confirm SARIF uploads to Security tab
  • Confirm ECR login works with OIDC role

Made with Cursor

Adds a reusable workflow for scanning container images against known CVEs
using Trivy. Covers OS packages and application-level dependencies in the
built image — the layer that Semgrep (SAST) and Socket (SCA) don't reach.

Supports ECR images via OIDC role assumption. Results upload to GitHub
Security tab as SARIF for unified visibility alongside Semgrep findings.

Uses Trivy directly because Socket Basics' pre-built GitHub Action image
temporarily ships without Trivy (as of v2.0.3). Comment in the file
documents the swap path when Socket re-enables it.

Satisfies SOC 2 VPM-2 (quarterly vulnerability scans on external-facing
systems) with every-deploy cadence.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread .github/workflows/container-scan.yml
Comment thread .github/workflows/container-scan.yml Outdated
Comment thread .github/workflows/container-scan.yml
@firetiger-agent

Copy link
Copy Markdown

Created a monitoring plan for this PR.

What this PR does: Adds automated container image vulnerability scanning (Trivy) to every deploy, with findings surfaced in the GitHub Security tab — satisfying the SOC 2 VPM-2 quarterly scan requirement at every-deploy cadence.

Intended effect:

  • Container Scan workflow registration: baseline 0 (new workflow); confirmed if gh workflow list --repo kernel/security-workflows shows "Container Scan" as active within 5 minutes of merge
  • First consumer invocation: baseline none (no prior runs); confirmed if the first container-scan / scan job in a consuming repo (e.g. kernel/kernel api-build) concludes success
  • SARIF upload to Security tab: baseline none; confirmed if findings appear under Security > Code Scanning in the consuming repo after the first run
  • Existing Vulnerability Remediation Self-Test: baseline 100% success rate (all 50 runs May 28–Jun 10); confirmed if no regression after merge

Risks:

  • ECR OIDC login failurecontainer-scan / scan job concludes failure with "Configure AWS credentials" or "Log in to ECR" step failing; alert if first consumer invocation fails this step
  • Floating @master Trivy action breaks — "Scan container image" step fails on a run that previously passed; alert if any Container Scan job concludes failure after a prior success
  • Workflow not registered on merge — "Container Scan" absent from gh workflow list within 5 minutes of merge; alert if workflow is missing or disabled
  • Regression in existing workflowsVulnerability Remediation Self-Test concludes failure within 48h of merge; alert on any failure (pre-merge success rate is 100%)

Status updates will be posted automatically on this PR as monitoring progresses.

View monitor

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread .github/workflows/container-scan.yml
GitHub Code Security license required for SARIF upload to the Security
tab on private repos. Replace with:
- Table output in workflow logs (human-readable)
- JSON artifact (machine-readable, audit evidence)
- Step summary with severity counts (visible on the PR)

Also removes actions:read and security-events:write permissions since
they're no longer needed without the codeql-action.

Co-authored-by: Cursor <cursoragent@cursor.com>
with:
name: container-scan-report
path: trivy-results.json
if: always()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing SARIF Security tab upload

Medium Severity

The reusable workflow only emits JSON (trivy-results.json) and a workflow artifact, but the PR promises SARIF upload to the GitHub Security tab. There is no format: sarif scan step or github/codeql-action/upload-sarif, and the job lacks security-events: write, so unified code scanning findings never appear.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 8e51fc9. Configure here.

Replace standalone Trivy + SARIF upload with Socket Basics action.
Results flow to Socket Dashboard — single pane of glass alongside
the existing SCA and SAST findings.

Socket Basics handles Trivy internally. If their pre-built action
doesn't have container scanning ready yet, the run will surface the
issue and we can iterate.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread .github/workflows/container-scan.yml Outdated
ulziibay-kernel and others added 2 commits June 12, 2026 12:58
Co-authored-by: Cursor <cursoragent@cursor.com>
Adds a triage job that runs after the scan when findings exist on PRs.
Uses Claude Code to evaluate each CVE for relevance (runtime applicability,
fix availability, ownership) and posts a single actionable PR comment with
prioritized remediation steps.

Modeled after the semgrep-triage-prompt.md pattern: filters noise (Windows
CVEs, vendor binaries, unfixed vulns), surfaces what matters, provides
exact fix commands.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread .github/workflows/container-scan.yml Outdated
Comment thread .github/workflows/container-scan.yml Outdated
Comment thread .github/workflows/container-scan.yml Outdated
…ion (weekly)

container-scan.yml: scan-only, runs every build for audit evidence.
No triage, no PR comments. Just Trivy table + JSON artifact + summary.

container-remediation.yml (new): weekly scheduled workflow that scans the
production image, triages findings with Claude, and applies fixes via
Cursor agent. Creates/updates an evergreen security/container-remediation
PR — same pattern as vuln-remediation.yml for Socket SCA.

Prompts updated:
- triage-prompt.md: outputs structured triage-result.json (not PR comments)
- fix-prompt.md (new): applies safe Go dep bumps and base image patches

Co-authored-by: Cursor <cursoragent@cursor.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 75b3296. Configure here.

if: always()

fix:
needs: triage

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix job requires triage success

Medium Severity

The fix job lists needs: triage with no if that allows triage failure, so a failed or timed-out triage skips remediation entirely. That conflicts with fix-prompt.md, which expects fixes from trivy-results.json when triage output is absent, and leaves scan artifacts unused after triage errors.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 75b3296. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant