Skip to content

Add Promptfoo (LLM eval & red-teaming) parser#15081

Draft
Dashtid wants to merge 1 commit into
DefectDojo:devfrom
Dashtid:promptfoo-parser
Draft

Add Promptfoo (LLM eval & red-teaming) parser#15081
Dashtid wants to merge 1 commit into
DefectDojo:devfrom
Dashtid:promptfoo-parser

Conversation

@Dashtid

@Dashtid Dashtid commented Jun 24, 2026

Copy link
Copy Markdown

Description

Adds a parser for promptfoo, an open-source LLM evaluation and red-teaming tool, aligned with the AI-testing direction in discussion #13242.

The parser ingests the JSON results file written by promptfoo eval -o results.json (and promptfoo redteam run -o results.json). promptfoo's pass/fail semantics are inverted relative to most scanners, which is the central design point:

  • A result with success: true means every assertion passed — for a red-team probe that means the target model defended the attack — so it is not a finding.
  • A result with success: false is a failed assertion (for a red-team probe, the attack succeeded) and becomes a Finding.
  • Results with failureReason == 2 (a provider/eval error rather than an assertion failure) are skipped — the test could not run, so it is not a vulnerability.

Other decisions:

  • Severity comes from the red-team metadata.severity (critical/high/medium/low). A plain promptfoo eval failure carries no severity metadata and defaults to Medium.
  • CWE is mapped from the plugin / harm category as a deliberately coarse starter, verified against MITRE: SQL-injection -> CWE-89, shell/command-injection -> CWE-78, prompt-injection / prompt-extraction -> CWE-1427, PII / privacy -> CWE-200, default -> CWE-1426 (Improper Validation of Generative AI Output).
  • Failures for the same plugin against the same target are aggregated into one Finding (nb_occurences, keeping the most severe rung).
  • Registered for hash_code deduplication on title + component_name. severity and description are intentionally excluded: description holds the per-run attack input/output, and severity is an aggregate that shifts as the set of failed attempts changes — neither is stable enough for the dedup hash.

Test results

Adds unittests/tools/test_promptfoo_parser.py (14 tests) covering: zero/one/many findings, the severity matrix, the CWE mapping (including the specific *-injection rules taking precedence over the broad rule), aggregation, the plain-eval fallback (severity/title/identity from the failed assertion, metric-over-type), skipping passed and errored results, the lenient input shapes (bare list, top-level results list), shareableUrl -> references, string-form providers, bytes + UTF-8 BOM + non-ASCII input, and rejection of non-JSON input.

The sample scan files under unittests/scans/promptfoo/ are real promptfoo v3 output (results.version == 3).

Documentation

Adds docs/content/supported_tools/parsers/file/promptfoo.md.

Checklist

  • Submitted against dev
  • Ruff-compliant (ruff.toml, ruff 0.15.16)
  • Python 3.13 compliant
  • Documentation included
  • No model changes (no migration needed)
  • Unit tests added
  • Labels (for maintainers): suggest Import Scans and settings_changes (touches settings.dist.py for deduplication)

Adds a file-based parser for promptfoo (https://promptfoo.dev) results
JSON, produced by `promptfoo eval -o results.json` or
`promptfoo redteam run -o results.json`.

- Inverted semantics: a result with success:false (a failed assertion /
  successful red-team attack) becomes a Finding; success:true (the model
  defended) is skipped, as are failureReason==ERROR (provider) results.
- Severity from red-team metadata.severity, with a Medium fallback for
  plain-eval failures; CWE mapped from the plugin/category as a coarse
  starter (89/78/1427/200, default 1426).
- Failures for the same plugin against the same target aggregate into one
  Finding (nb_occurences), keeping the most severe rung.
- Deduplicated via hash_code on title + component_name; severity and
  description are excluded as unstable across runs.

Verified against the promptfoo v3 results schema (results.version == 3);
the sample scan files are real promptfoo output. Adds unit tests and
parser documentation.

Signed-off-by: David Dashti <dashti.dat@gmail.com>
@github-actions github-actions Bot added settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR docs unittests parser conflicts-detected labels Jun 24, 2026
@github-actions

Copy link
Copy Markdown
Contributor

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

conflicts-detected docs parser settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR unittests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant