Area
Agent Evaluators (evaluators/agent/)
Problem or motivation
Safety filters can be bypassed using character substitution (homoglyphs, unicode lookalikes, leetspeak). This is a known jailbreak technique.
Proposed solution
Add evaluator under evaluators/agent/injection/ that:
- Tests detection of obfuscated malicious prompts
- Covers Unicode homoglyphs, ASCII art encoding, leetspeak
Acceptance criteria
Alternatives considered
No response
Area
Agent Evaluators (
evaluators/agent/)Problem or motivation
Safety filters can be bypassed using character substitution (homoglyphs, unicode lookalikes, leetspeak). This is a known jailbreak technique.
Proposed solution
Add evaluator under
evaluators/agent/injection/that:Acceptance criteria
Alternatives considered
No response