Summary
This issue tracks adding a standalone prompt-injection evaluation fixture for
AGT.
The fixture would provide:
- a synthetic labelled prompt-injection corpus;
- manifest and corpus hygiene checks;
- a baseline harness for the existing Rust prompt-injection detector;
- summary metrics for the rules-only detector on this fixture;
- documentation for reproducing the benchmark and interpreting its limits.
Scope
This is an evaluation fixture only. It does not propose a runtime behavior
change, embedding detector, default threshold, policy-routing integration, or
default blocking behavior.
Why
AGT's existing rules layer is intentionally high-precision and low-recall. A
standalone fixture would make that trade-off measurable on a reproducible
benchmark and give future prompt-injection detector changes a stable baseline.
Acceptance Criteria
- The fixture can be run from the repository with documented commands.
- Corpus split, duplicate, and leakage checks are reproducible.
- Baseline metrics are tied to an exact AGT commit, detector file hash, corpus
manifest hash, and command.
- Documentation states that results are corpus-specific and not a general claim
about AGT detection quality.
- No AGT runtime behavior changes are included.
Summary
This issue tracks adding a standalone prompt-injection evaluation fixture for
AGT.
The fixture would provide:
Scope
This is an evaluation fixture only. It does not propose a runtime behavior
change, embedding detector, default threshold, policy-routing integration, or
default blocking behavior.
Why
AGT's existing rules layer is intentionally high-precision and low-recall. A
standalone fixture would make that trade-off measurable on a reproducible
benchmark and give future prompt-injection detector changes a stable baseline.
Acceptance Criteria
manifest hash, and command.
about AGT detection quality.