scrutineer

A local tool for scanning open source repositories for security vulnerabilities and managing the disclosure process. You add a repo by URL, scrutineer runs a pipeline of claude-code skills against it, and presents the results in a web UI where you can triage findings, identify maintainers, and track disclosures.

Quick start

You need Go 1.26+ and Docker running.

git clone https://github.com/alpha-omega-security/scrutineer
cd scrutineer

Authenticate Claude with one of two options:

Option A: Claude Code subscription (Max, Pro, Team, or Enterprise) -- generate a long-lived OAuth token with the Claude CLI:

claude setup-token
export CLAUDE_CODE_OAUTH_TOKEN=sk-ant-oat01-...

Option B: Anthropic API key from console.anthropic.com:

export ANTHROPIC_API_KEY=sk-ant-api03-...

Then start scrutineer:

export ANTHROPIC_BASE_URL=https://...  # optional: custom API endpoint
go run ./cmd/scrutineer -skills ./skills

Then open http://127.0.0.1:8080.

Scrutineer detects Docker and starts using it automatically: each scan runs in an ephemeral container with a read-only source mount and an egress allowlist proxy. The runner image (ghcr.io/alpha-omega-security/scrutineer-runner) is pulled on first use, so the first scan is slower while it downloads. If Docker isn't available scans run directly on the host with no isolation; see the Security section before doing that.

Click Add repository in the sidebar, paste a git HTTPS URL, and scrutineer enqueues the triage skill. To scan a maintained branch instead of the default, fill the Branch field (it suggests the remote's branches as you type and also accepts a tag or commit), or append a /tree/<branch> suffix to the URL; the suffix also works one-per-line when bulk-importing. Triage then enqueues the rest of the pipeline in parallel. Metadata and package lookups finish in seconds; the security deep-dive takes a few minutes depending on repo size. Open the repo page and switch to the Scans tab to watch progress, or wait for the Findings tab to fill in.

You can also scan a directory on disk, useful before pushing, or for code not hosted on a git forge. Paste an absolute path (/path/to/project) in the same Add repository field. Scrutineer copies the directory into a per-scan workspace and runs the default skill set; skills that need a forge URL or ecosyste.ms enrichment (advisories, dependents, exposure, fork, maintainers, metadata, packages, report-upstream) are skipped automatically. Symlinks are recreated as-is rather than dereferenced during the copy; in Docker mode their targets then resolve inside the container, so host files reached only through such a link are not visible to skills. Without Docker the kernel dereferences them normally, so only point scrutineer at trees you trust.

The optional analysis tools (semgrep, zizmor, git-pkgs, brief) are bundled in the runner image, so you don't need them installed locally when Docker is in use.

Git authentication

Scrutineer shells out to git clone with no explicit token passing, so it uses whatever credentials are already configured on the host: SSH keys, credential helpers, gh auth login, a .netrc file, or the macOS keychain.

To scan private repos, make sure git clone https://github.com/org/repo works in your terminal before adding the URL to scrutineer. If it does, scrutineer can clone it too.

Common setups:

# GitHub CLI (easiest)
gh auth login

# Git credential helper
git config --global credential.helper store   # or osxkeychain / manager-core

# SSH-based clone URLs are not supported -- scrutineer only accepts https:// URLs.
# Use a credential helper to authenticate HTTPS clones instead.

When running inside Docker (docker run ...), the container has no access to host credentials. Mount a credential store or set GIT_ASKPASS to provide access to private repos from inside the container.

When the containerised runner is active (the default when Docker is available), each scan runs in a separate container but the clone happens on the host before the source is mounted in. Host credentials are used for the clone; the container never sees them.

Features

Scanning and analysis

Skill-based scan pipeline -- every scan is a claude-code skill on disk (SKILL.md + schema + optional scripts). The default pipeline for a new repo is itself a skill (triage) that enqueues the others; edit its SKILL.md to change what runs
Structured findings -- vulnerability reports parsed into a database with severity, CWE, location (linked to source), affected versions, and a six-step analysis trace
Threat model view -- the project's security contract (components, entry-point trust table, properties provided and disclaimed, known non-findings) rendered from the threat-model scan, falling back to the deep-dive's boundaries and sink inventory on older repositories
Dependency exploration -- dependency and dependent tables with one-click import to scan any package's source repository
Package registry data -- downloads, dependents, versions, and registry links for every published package
Known advisories -- existing CVEs and security advisories pulled automatically
Maintainer identification -- model-backed skill combining commit history, issue/PR activity, and registry ownership to identify who to contact for disclosure
CWE catalogue -- embedded MITRE CWE data with tooltips on finding tables and full descriptions on finding pages
Reachability analysis -- trace sinks found in dependencies through application code to see which are actually reachable
Rescan dedup -- findings carry a content fingerprint so re-running a scan updates existing rows instead of creating duplicates; same-fingerprint hits within one scan collapse to a single finding with a +N expandable location list, and findings that stop appearing are marked "not seen" with a miss count

Triage and disclosure workflow

Finding workflow -- guided triage flow from new through verification, disclosure, and publication with human gates at each step
Cheap-classifier pre-sort -- the revalidate skill auto-enqueues for High/Critical deep-dive findings and every imported finding, emitting true_positive / false_positive / already_fixed / uncertain plus an optional severity adjustment; a true_positive on a High/Critical finding chains into verify automatically
Audit queue -- random sample of recent low and false-positive verdicts at /audit so the operator can spot-check the classifier; each review records an agreement-or-overturn verdict on the finding
Exploited-in-the-wild flag -- analyst-only yes/no field on findings with free-text evidence, surfaced on the finding page, in the OSV database_specific block, in CSAF audit notes, and in markdown report exports
Breaking-change classifier -- the breaking-change skill runs over a suggested-fix diff plus the top dependents, recording breaking / non_breaking / unknown with a rationale and the list of affected dependents
Mitigation guidance -- the mitigate skill drafts short-term workarounds and an optional semgrep rule per finding, separate from the code fix
CVSS v3.1 and v4.0 -- both vectors stored side by side with derived scores; analyst form, OSV/CSAF exports, and the disclose skill all carry both forward, with the v4 metric set kept distinct from v3
Release watch -- the release-watch skill closes the gap between fix-landed and fix-shipped: once a finding reaches fixed, the skill polls upstream releases and records the release tag, URL, and timestamp when it appears
CNA matching -- identify the CVE Numbering Authority whose scope covers a repo so disclosures go to the right contact
Upstream reporting -- file a finding on the upstream repository through GitHub's private vulnerability reporting with the proposed patch attached, and push the fix to the temporary private fork when GitHub grants access. A PVR report is hard to unsend; before pointing this at an external repository, run it once end-to-end against a repository you control with PVR enabled to confirm the body shape and patch attachment land the way you expect. When upstream has no PVR available, follow the runbook in docs/disclosure-fallback.md

Imports and exports

SBOM import -- upload a CycloneDX or SPDX document, resolve each component to a source repository, and queue scans automatically
Finding import -- POST SARIF, CSV, markdown, or minimal-JSON findings from external scanners and pentest reports into the same workflow as native scans, with fingerprint dedup against re-imports
Free-form ingest -- when the format sniffer in /api/v1/import cannot place a payload, the ingest skill normalises it against the source checkout (resolving locations) before it enters the findings table
CSAF export -- download any finding as a schema-validated CSAF 2.0 advisory document
OSV export -- download any finding as a schema-validated OSV record, aligned with the OSS-SIRT advisory template (credits, CWE IDs, withdrawn, SEMVER ranges, CVSS v3 + v4 severity entries)
JSONL export -- stream all findings or scans as line-delimited JSON for ingestion elsewhere
Markdown report export -- download a single consolidated report.md per repository or organisation
Disclosure bundle -- download bundle.tar.gz per finding: OSV, CSAF, markdown report, patch.diff, and a manifest naming the contents; ready to hand to a coordinator or attach to a private email when filing outside GitHub PVR

Operational

Containerised runner -- optional per-scan Docker isolation with read-only source mounts, dropped capabilities, and an authenticated egress allowlist proxy
Skill HTTP API -- running skills can call back into scrutineer to list prior scans and enqueue further skills; surface documented in openapi.yaml
Live updates -- SSE streaming of scan logs and status changes, no polling
Organisation rollup -- repos, findings, and maintainers grouped by owning org, with per-org markdown exports
Usage tracking -- per-scan token and cost figures plus a /usage page totalling spend per skill
Themes -- six colour themes plus a light/dark/system toggle, set on the Settings page

The default pipeline

When a repo is added, the triage skill is enqueued. Its SKILL.md lists the skills to trigger. The bundled skills live in skills/:

Skill	What it does
`triage`	Orchestrates the default scan set via the scrutineer API
`metadata`	Fetches repo metadata from repos.ecosyste.ms
`packages`	Looks up published packages from packages.ecosyste.ms
`advisories`	Fetches known security advisories
`dependents`	Top runtime dependents per package
`dependencies`	Runs `git-pkgs list` to index every manifest
`sbom`	Runs `git-pkgs sbom` for a CycloneDX SBOM
`maintainers`	Model-backed analysis identifying real maintainers and contact routes
`repo-overview`	Runs `brief --json` for a structured project summary
`subprojects`	Enumerates monorepo packages/workspaces so deep-dives can be scoped to a sub-path
`threat-model`	Derives the project's security contract (components, entry-point trust table, claimed and disclaimed properties) for the deep-dive to load
`semgrep`	Static analysis mapped into findings shape
`zizmor`	GitHub Actions workflow audit mapped into findings shape
`ingest`	Normalizes external reports in arbitrary formats into findings when `/v1/import` cannot recognise the payload
`security-deep-dive`	The model-backed audit producing structured findings
`finding-dedup`	Compares open findings and marks overlapping reports as duplicates
`verify`	Re-checks one finding against current HEAD; records reproduces / fixed / can't-reproduce
`revalidate`	Cheap read-only classifier (prose + `git log`, no PoC execution) that emits true / false positive / already-fixed / uncertain; auto-enqueued for High/Critical from `security-deep-dive` and for every imported finding. A `true_positive` on a High/Critical finding chains automatically to `verify`
`breaking-change`	Static breaking-change check on the suggested-fix diff; records `breaking`/`non_breaking`/`unknown` with rationale and the affected dependents
`release-watch`	After a finding reaches `fixed`, watches the upstream for a release containing the fix commit; records release tag, URL, and timestamp on the finding
`disclose`	Drafts a GHSA-shaped advisory (title, description, CVSS, CWEs, references) for one finding
`patch`	Proposes a unified diff fixing one finding; a diff that passes the applicability gate is stored on the finding as its suggested fix
`report-upstream`	Files one finding on the upstream repository via GitHub PVR with the proposed patch attached; the action that moves a finding to `reported`
`reachability`	Traces dependency sinks through application code to determine which are reachable from trust boundaries
`cna-match`	Matches a repository to its CVE Numbering Authority so disclosures route to the right contact
`posture`	Records the repo's security posture (reporting policy, response history, hardening) on the Repository row

Edit skills/triage/SKILL.md to change what gets run by default. Drop new skill directories in skills/ to add scan types; no code changes needed. See docs/skills.md for the frontmatter reference, the scrutineer.* metadata keys, the context.json shape, output kinds, schema validation, and the skill-facing HTTP API.

Before each scan, lockfiles, minified bundles, and generated trees are stripped from the workspace so the skill doesn't waste turns on them. The builtin skip list covers node_modules, dist, generated, __generated__, *.min.js/*.min.css, and the common lockfiles (pnpm-lock.yaml, package-lock.json, yarn.lock, Cargo.lock, go.sum, Gemfile.lock, poetry.lock, composer.lock). Skills can override this with scrutineer.paths (allow-list) and layer scrutineer.ignore_paths on top; see docs/skills.md.

Importing findings from other tools

Scrutineer can ingest findings produced elsewhere so they enter the same triage and disclosure workflow:

curl --data-binary @report.sarif http://127.0.0.1:8080/api/v1/import

SARIF 2.1.0, CSV, markdown, and a minimal JSON shape are all accepted; the format is sniffed from the body. See docs/import.md for the full request and response shape, the per-format field mapping, and how to add support for a new format.

Navigating the UI

Every index page has a search box plus filter and sort dropdowns; the specifics vary by page. The sidebar sections:

Repositories -- your scanned repos with language, last-scan status, and finding counts. Click into one for tabs covering Summary, Findings, Threat Model, Packages, Dependencies, Dependents, Advisories, Maintainers, Data, and Scans, plus an "Export report" button for a markdown rollup.
Organizations -- repos, findings, and maintainers grouped by owning org, with per-org markdown exports.
Findings -- every vulnerability across all repos. A finding page shows the six-step analysis (trace, boundary, validation, prior art, reach, rating), scoring fields, notes, communications log, references, labels, and a change history.
Packages -- registry entries discovered across all repos.
Advisories -- known CVEs and security advisories pulled for any scanned package.
Maintainers -- people identified as maintainers, with their linked repos and findings.
SBOMs -- uploaded CycloneDX/SPDX documents. Each component is resolved to a source repository and can be imported for scanning.
Audit -- random sample of recent low and false-positive verdicts for spot-checking the cheap-classifier output. Each row records the analyst's agreement-or-overturn verdict, and a small dashboard shows the running overturn rate.
Scans -- every scan that has run. Queued scans can be paused/resumed, running or queued scans can be cancelled and failed ones retried.
Skills -- installed skills from disk and from the UI; view, edit, or run any of them.
Usage -- token and cost totals across all scans, broken down by skill.
Settings -- theme, colour scheme, default model, runner concurrency (restarts the runner to apply, cancelling in-flight scans) and default turn cap (applied to the next scan), plus system stats (record counts, DB size, paths).

Finding workflow

Each finding from the security-deep-dive skill starts at new and moves through a guided workflow:

new -- just identified. High/Critical from security-deep-dive and every imported finding auto-enqueue a revalidate pass first, which records true_positive / false_positive / already_fixed / uncertain on the finding and (when true_positive on High/Critical) chains into verify. Outside that path: click "Verify" to trigger independent confirmation, "Skip to triage" if you trust the audit, or "Reject"
enriched -- verification ran. Review and click "Triage"
triaged -- confirmed real. Click "Prepare disclosure"
ready -- draft prepared. Run the report-upstream skill to file it via GitHub PVR (github.com only, requires gh auth), or click "Mark as reported" after sending it yourself. When upstream has no PVR, follow the runbook in docs/disclosure-fallback.md: route to a CNA when cna-match names one, otherwise contact the channel maintainers returned
reported -- sent to maintainer. Click "Acknowledged" when they respond
acknowledged -- maintainer working on fix. Click "Mark fixed" when it ships
fixed -- patch available. Click "Mark published" to issue the advisory
published -- done

Each finding page has a notes section for recording triage reasoning and communication history.

A patch run whose diff survives the applicability gate (the diff parses, targets files that exist, touches the flagged file, and passes git apply --check) is stored on the finding as suggested_fix with its base commit, downloadable from the finding page as a .patch file and included in markdown report exports. To revise a fix, push your edits to a branch, scan that branch (the Branch field, or a /tree/<branch> URL suffix), and run patch against the new scan: the diff is proposed against that ref's tree, so each round of edit, push, and rescan gets a fresh proposal on top of your work.

Exploring dependencies

The Dependencies tab on a repo groups packages by name and shows all manifest files where each appears. It shows runtime dependencies by default, with a toggle for test/build/dev rows. The import button (arrow icon) next to a dependency resolves it to a repository URL via packages.ecosyste.ms and queues the full pipeline for it. Dependencies you've already imported show a link icon instead.

The same applies to the Dependents tab -- you can import any dependent's repository with one click.

Docker

docker build -t scrutineer .
docker run -p 127.0.0.1:8080:8080 -v scrutineer-data:/data \
  -e ANTHROPIC_API_KEY=sk-ant-api03-... \
  -e ANTHROPIC_BASE_URL=https://... \
  scrutineer

Or with a Claude Code OAuth token instead of an API key:

docker run -p 127.0.0.1:8080:8080 -v scrutineer-data:/data \
  -e CLAUDE_CODE_OAUTH_TOKEN=sk-ant-oat01-... \
  scrutineer

Always bind to 127.0.0.1. The UI has no authentication; binding to 0.0.0.0 exposes your findings database to anyone on the network.

If docker is available on the host, scrutineer runs each scan in an ephemeral container for isolation. The runner image is published to GHCR as a multi-arch manifest (linux/amd64 and linux/arm64) and pulled automatically on first use:

go run ./cmd/scrutineer -skills ./skills

Use --no-docker to disable containerised execution, or --runner-image to specify a different image. To build the runner locally instead of pulling from GHCR:

docker build -t scrutineer-runner -f Dockerfile.runner .
go run ./cmd/scrutineer -skills ./skills --runner-image scrutineer-runner

When the docker runner is active, scrutineer starts an authenticated egress proxy on the host and points HTTPS_PROXY/HTTP_PROXY inside the container at it. The proxy only tunnels to an allowlist of hosts: the Anthropic API, *.ecosyste.ms, the major forges (GitHub, GitLab, Codeberg, Bitbucket), common package registries (npm, PyPI, RubyGems, crates.io, Go module proxy, Packagist, Hex, NuGet), advisory sources (semgrep.dev, OSV, NVD, cwe.mitre.org), and host.docker.internal for the local skill API. Requests to anything else get a 403 and are logged. Extend the list with egress_allow in the config file. When -anthropic-base-url is set (or falls back to the ANTHROPIC_BASE_URL env var), its hostname is automatically added to the allowlist. The proxy uses a per-process random token so it isn't an open relay; tools that ignore the proxy env are not blocked at the network layer (see threatmodel.md).

For deployments that treat skill prompts as untrusted, pass --hardened (or hardened: true in the config). The flag forces the docker runner (--no-docker is rejected), trims the egress allowlist to *.anthropic.com plus the host skill API (so egress_allow is ignored, drop the flag if you need to widen it), mounts the container rootfs read-only with no-new-privileges, attaches each scan to its own ephemeral docker network created with --internal (removed when the scan ends) so a process that ignores HTTPS_PROXY has no route out and concurrent scans cannot reach each other, and refuses scans whose workspace footprint exceeds 2 GiB once the clone completes. The 2 GiB check is post-clone: it bounds what hardened mode will agree to scan, not what can land on disk during the clone itself; use OS-level disk quotas if you need a clone-time guarantee. Bundled skills that hit ecosyste.ms or a package registry directly will fail under hardened mode unless they route through the host skill API. Per-ecosystem runner profiles still apply, but profile images that need writable paths beyond /work and /tmp are incompatible.

Flags

Flag	Default	Description
`-config`	`./scrutineer.yaml` if present	Path to YAML config file
`-addr`	`127.0.0.1:8080`	Listen address
`-data`	`./data`	Data directory for the database and workspaces
`-effort`	`high`	Claude effort level
`-skills`	-	Local directory to load SKILL.md files from (repeatable)
`-skills-repo`	-	`owner/repo[@ref]` or git HTTPS URL `https://host/path[@ref]` to clone skills from on startup; `@ref` pins a branch, tag or commit and the resolved SHA is recorded on every scan
`--no-docker`	false	Disable containerised runner
`--hardened`	false	Strict sandbox: docker required, egress restricted to `*.anthropic.com` + host skill API, read-only rootfs, internal docker network
`--runner-image`	`ghcr.io/alpha-omega-security/scrutineer-runner:latest`	Docker image for per-scan containers
`-concurrency`	`4`	Number of scans to run in parallel
`-clone`	`shallow`	Clone depth: `shallow` (`--depth 1`) or `full`
`-scan-timeout`	`1h`	Wall-clock limit per scan; exceeded scans fail
`-max-turns`	`0`	Passed as `--max-turns` to claude-code (0 = unlimited)
`-schema-strict`	`false`	Fail a scan when its `report.json` does not validate against the skill's `schema.json` (default: warn in the scan log and parse anyway)
`-anthropic-base-url`	-	Custom Anthropic API base URL (env: `ANTHROPIC_BASE_URL`)

Config file

Every flag above can be set in a YAML config file instead. The loader checks ./scrutineer.yaml by default; override with -config path/to/file. Command-line flags always win. See scrutineer.sample.yaml for the full shape.

The config file can also replace the model pick list and pin the default model:

default_model: claude-sonnet-4-6
models:
  - name: Sonnet
    id:   claude-sonnet-4-6
  - name: Opus
    id:   claude-opus-4-7

Sandboxed Claude Code configs

In --no-docker mode the claude subprocess inherits your ~/.claude/settings.json, so sandbox settings that restrict network or filesystem access there will fail skills that need them. Point claude at a separate config directory just for scrutineer runs:

CLAUDE_CONFIG_DIR=~/.claude-scrutineer go run ./cmd/scrutineer -skills ./skills

Copy your settings.json into that directory and drop the sandbox keys; your normal Claude Code config is untouched. Docker mode is not affected: claude runs inside the container with its own environment regardless of the host config.

Security

See SECURITY.md for the reporting policy and threatmodel.md for the full threat model. The short version: scanning a repository is equivalent to running code from it. The containerised runner (when available) isolates each scan, but the default bare-metal mode runs everything as your user. Only scan repositories you'd be willing to clone and build locally.

Further documentation

docs/skills.md -- bundled skills, writing your own, frontmatter and output-kind reference
docs/import.md -- importing findings from other tools (SARIF, CSV, markdown, minimal JSON) and adding new formats
openapi.yaml -- the skill-facing HTTP API
docs/database.md -- full database schema reference
docs/backup.md -- backing up and restoring the database (built-in scrutineer backup/restore, sqlite3, Litestream)
docs/development.md -- project layout, regenerating embedded data, running tests

Name		Name	Last commit message	Last commit date
Latest commit History 347 Commits
.github		.github
cmd/scrutineer		cmd/scrutineer
docker/profiles		docker/profiles
docs		docs
internal		internal
scripts		scripts
skills		skills
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.golangci.yml		.golangci.yml
Dockerfile		Dockerfile
Dockerfile.runner		Dockerfile.runner
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
go.mod		go.mod
go.sum		go.sum
openapi.yaml		openapi.yaml
scrutineer.sample.yaml		scrutineer.sample.yaml
threatmodel.md		threatmodel.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scrutineer

Quick start

Git authentication

Features

Scanning and analysis

Triage and disclosure workflow

Imports and exports

Operational

The default pipeline

Importing findings from other tools

Navigating the UI

Finding workflow

Exploring dependencies

Docker

Flags

Config file

Sandboxed Claude Code configs

Security

Further documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

scrutineer

Quick start

Git authentication

Features

Scanning and analysis

Triage and disclosure workflow

Imports and exports

Operational

The default pipeline

Importing findings from other tools

Navigating the UI

Finding workflow

Exploring dependencies

Docker

Flags

Config file

Sandboxed Claude Code configs

Security

Further documentation

License

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages