Add a prompt-injection evaluation fixture

## Summary

This issue tracks adding a standalone prompt-injection evaluation fixture for
AGT.

The fixture would provide:

- a synthetic labelled prompt-injection corpus;
- manifest and corpus hygiene checks;
- a baseline harness for the existing Rust prompt-injection detector;
- summary metrics for the rules-only detector on this fixture;
- documentation for reproducing the benchmark and interpreting its limits.

## Scope

This is an evaluation fixture only. It does not propose a runtime behavior
change, embedding detector, default threshold, policy-routing integration, or
default blocking behavior.

## Why

AGT's existing rules layer is intentionally high-precision and low-recall. A
standalone fixture would make that trade-off measurable on a reproducible
benchmark and give future prompt-injection detector changes a stable baseline.

## Acceptance Criteria

- The fixture can be run from the repository with documented commands.
- Corpus split, duplicate, and leakage checks are reproducible.
- Baseline metrics are tied to an exact AGT commit, detector file hash, corpus
  manifest hash, and command.
- Documentation states that results are corpus-specific and not a general claim
  about AGT detection quality.
- No AGT runtime behavior changes are included.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a prompt-injection evaluation fixture #2923

Summary

Scope

Why

Acceptance Criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add a prompt-injection evaluation fixture #2923

Description

Summary

Scope

Why

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions