Bug Description
When Anthropic ships a model update, or when action config changes, run quality can silently degrade with zero signal. Runs that used to produce 200-line PRs now produce 40-line PRs. Runs that used to touch 5 files now touch 1. Everything still reports as 'success' because the only signal is exit code 0.
Reproduction
- Run claude-code-action on the same set of tasks over weeks
- Model update happens (or you change a prompt, or context changes)
- Output quality drops — shorter responses, fewer changes, less thorough
- No alert, no metric, no way to notice until you manually compare before/after
Expected Behavior
The action should emit structured run metadata (duration, tokens used, files changed, lines added/removed, model version, truncation flag) so users can track quality over time and detect drift.
Actual Behavior
You get a pass/fail exit code and a PR comment. No time-series data. No structured metrics. No drift signal. You're blind to regression until it's obvious.
Impact
- Model updates cause silent quality regressions across entire organizations
- Config changes (prompt tweaks, context limits) have no measurable feedback loop
- Multi-repo deployments have no way to compare quality across repos
- 'It worked last week' is the only drift detection mechanism available
Suggested Fix
Emit structured telemetry as action outputs:
\\yaml
- uses: anthropics/claude-code-action@v1
with:
emit_telemetry: true
Outputs:
steps.claude.outputs.run_duration_ms
steps.claude.outputs.tokens_used
steps.claude.outputs.files_changed
steps.claude.outputs.lines_added
steps.claude.outputs.lines_removed
steps.claude.outputs.model_version
steps.claude.outputs.truncated
\\
Users pipe these to Datadog/Grafana/CSV and set alerts on drift. A run going from 200 lines to 40 lines is a signal — today there's no way to see it.
This completes the triad: #1392 validates output, #1393 retries failures, this issue observes trends over time.
Building drift detection in agent-eval. Happy to contribute a reference integration.
Bug Description
When Anthropic ships a model update, or when action config changes, run quality can silently degrade with zero signal. Runs that used to produce 200-line PRs now produce 40-line PRs. Runs that used to touch 5 files now touch 1. Everything still reports as 'success' because the only signal is exit code 0.
Reproduction
Expected Behavior
The action should emit structured run metadata (duration, tokens used, files changed, lines added/removed, model version, truncation flag) so users can track quality over time and detect drift.
Actual Behavior
You get a pass/fail exit code and a PR comment. No time-series data. No structured metrics. No drift signal. You're blind to regression until it's obvious.
Impact
Suggested Fix
Emit structured telemetry as action outputs:
\\yaml
with:
emit_telemetry: true
Outputs:
steps.claude.outputs.run_duration_ms
steps.claude.outputs.tokens_used
steps.claude.outputs.files_changed
steps.claude.outputs.lines_added
steps.claude.outputs.lines_removed
steps.claude.outputs.model_version
steps.claude.outputs.truncated
\\
Users pipe these to Datadog/Grafana/CSV and set alerts on drift. A run going from 200 lines to 40 lines is a signal — today there's no way to see it.
This completes the triad: #1392 validates output, #1393 retries failures, this issue observes trends over time.
Building drift detection in agent-eval. Happy to contribute a reference integration.