Bug Description
Runs complete with exit code 0 and no error signal even when the output is objectively broken — hallucinated file paths, empty diffs, truncated responses, or format that doesn't match what was requested.
Reproduction
- Trigger a claude-code-action run on a moderately complex task
- Run completes successfully (green check)
- Inspect the output: contains references to files that don't exist, or entire sections are empty/placeholder text
Expected Behavior
A run that produces hallucinated content, empty output, or malformed structure should not report as successful. At minimum, the action should expose output signals (e.g. \steps.claude.outputs.validation_passed) so downstream steps can gate on quality — not just exit code.
Actual Behavior
Exit code 0, green checkmark, no indication anything is wrong. Users discover the problem only on manual review.
Impact
- In CI/CD pipelines, a silently broken run wastes an entire cycle
- Automated triggers (cron, issue assignment) produce garbage with no alert
- Teams lose trust in the action because 'success' doesn't mean 'correct'
Environment
- Running 16+ automated agent jobs daily
- Multiple repos, cron-triggered and event-triggered
- Problem frequency: ~15-20% of runs produce output that should not have passed
Suggested Fix
A lightweight validation check before marking a run as successful. Could be opt-in:
\\yaml
- uses: anthropics/claude-code-action@v1
with:
validate_output: true
validation_checks: 'format,completeness,hallucination'
\\
Related: built this pattern in agent-eval — deterministic checks first, heuristics second, model-judge only when needed. Happy to contribute.
Bug Description
Runs complete with exit code 0 and no error signal even when the output is objectively broken — hallucinated file paths, empty diffs, truncated responses, or format that doesn't match what was requested.
Reproduction
Expected Behavior
A run that produces hallucinated content, empty output, or malformed structure should not report as successful. At minimum, the action should expose output signals (e.g. \steps.claude.outputs.validation_passed) so downstream steps can gate on quality — not just exit code.
Actual Behavior
Exit code 0, green checkmark, no indication anything is wrong. Users discover the problem only on manual review.
Impact
Environment
Suggested Fix
A lightweight validation check before marking a run as successful. Could be opt-in:
\\yaml
with:
validate_output: true
validation_checks: 'format,completeness,hallucination'
\\
Related: built this pattern in agent-eval — deterministic checks first, heuristics second, model-judge only when needed. Happy to contribute.