-
Notifications
You must be signed in to change notification settings - Fork 34
Add nightly OpenSSL command performance regression testing #411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -263,6 +263,75 @@ The scan-build and infer thresholds are baseline-based, not strict — | |
| they let pre-existing issues slide but flag obvious regressions. | ||
| Bringing them to 0 is a future cleanup. | ||
|
|
||
| ## Performance regression testing | ||
|
|
||
| `perf-regression.yml` runs nightly at 07:00 UTC (and on | ||
| `workflow_dispatch`). Customers run scripts that fire many `openssl` | ||
| commands in a row, and each invocation is a fresh process paying a full | ||
| wolfProvider init (plus, in FIPS builds, the per-algorithm CAST on first | ||
| use). This job guards the per-invocation cost of that path so a repeat of | ||
| the DH-CAST init blow-up gets caught automatically. | ||
|
|
||
| **This is an overhead regression tripwire, not a crypto throughput | ||
| benchmark, and not a wolfProvider-vs-OpenSSL speed comparison.** It only | ||
| asks one question: did per-command load/init overhead grow versus the | ||
| committed baseline? A loadable provider inherently pays process-startup | ||
| cost the built-in default provider does not, so the measured `overhead` | ||
| is expected to sit above 1.0 — that is not a defect and not a crypto-speed | ||
| result. | ||
|
|
||
| `scripts/perf_test/do-perf-tests.sh` times a small set of representative | ||
| commands — a near-no-op init probe (`list -providers`, `version`) plus | ||
| real verbs (`dgst`, `enc`, `genpkey` RSA/EC, `pkeyutl` sign, DH derive) — | ||
| taking the **minimum** of N runs to cut runner noise. Each command is | ||
| timed under both the OpenSSL default provider and wolfProvider; the | ||
| default provider serves **only as a per-run baseline to cancel | ||
| runner-speed variance**, and the `overhead` factor (wolfProvider ÷ | ||
| baseline) is checked against a committed budget | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The variance in machine/OS add extra variables between runs iiuc. Would it make sense to compare against master in a single job instead of a hardcoded baseline? |
||
| (`scripts/perf_test/perf-baseline.{nonfips,fips}.json`). The init probes | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any guidance on when to update the baseline? |
||
| are gated on absolute ms. The job fails only when a command exceeds its | ||
| budget (× tolerance) — i.e. when overhead *regresses*, never for being | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a case where we would want to tighten the baseline? Would significant improvement be a sign to re-baseline? Not sure if we'd want to flag an error if the PR is significantly better |
||
| above 1.0. | ||
|
|
||
| To keep the nightly from going red on a single noisy measurement, a | ||
| command that fails the gate is measured up to `PERF_CONFIRM` times total | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need this second layer of confirmation beyond the "N runs" above -- why not just increase the N? Maybe this showed up in your testing? |
||
| (default 3) and only reported as a regression if it fails **every** | ||
| attempt — one passing round clears it as a fluke. This is on top of each | ||
| measurement already taking the minimum of N runs. A command that exits | ||
| non-zero is reported as an error (not a silent pass), so a broken or | ||
| removed capability fails the job instead of looking fast. | ||
|
|
||
| There are two job variants. **non-FIPS** tracks general init/load | ||
| overhead. **FIPS** is the one that actually guards the CAST class — the | ||
| FIPS CAST code is compiled out of non-FIPS builds, so only the FIPS row | ||
| exercises the DH-derive CAST that originally regressed. | ||
|
|
||
| It runs nightly on its own cron, and can be pulled into a PR on demand by | ||
| adding the `ci:perf` label (via `pr-osp-select.yml`, same as the OSP jobs). | ||
|
|
||
| Run it locally: | ||
|
|
||
| ```sh | ||
| # non-FIPS | ||
| source scripts/env-setup | ||
| ./scripts/perf_test/do-perf-tests.sh | ||
|
|
||
| # FIPS - export before sourcing so env-setup selects provider-fips.conf | ||
| export WOLFSSL_ISFIPS=1 | ||
| source scripts/env-setup | ||
| ./scripts/perf_test/do-perf-tests.sh | ||
| ``` | ||
|
|
||
| Timing uses GNU `date +%s.%N`, so local runs need GNU coreutils (the | ||
| script errors out early on BSD/macOS `date`). CI runs on Linux. | ||
|
|
||
| The committed baselines are generous seeds — regenerate them on a stable | ||
| runner once and commit the result to tighten the gate: | ||
|
|
||
| ```sh | ||
| ./scripts/perf_test/do-perf-tests.sh --update-baseline | ||
| ``` | ||
|
|
||
| ## Triggering manually | ||
|
|
||
| Every nightly-capable workflow also has `workflow_dispatch:` so you | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,98 @@ | ||
| name: Performance Regression | ||
|
|
||
| on: | ||
| schedule: | ||
| - cron: '0 7 * * *' | ||
| workflow_dispatch: | ||
| workflow_call: | ||
|
|
||
| concurrency: | ||
| group: ${{ github.workflow }}-${{ github.ref }} | ||
| cancel-in-progress: true | ||
|
|
||
| jobs: | ||
| discover_versions: | ||
| uses: ./.github/workflows/_discover-versions.yml | ||
|
|
||
| perf_nonfips: | ||
| needs: discover_versions | ||
| name: Perf regression (non-FIPS) | ||
| runs-on: ubuntu-22.04 | ||
| timeout-minutes: 30 | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| openssl_ref: | ||
| - master | ||
| - ${{ needs.discover_versions.outputs.openssl_latest_ref }} | ||
| wolfssl_ref: ${{ fromJson(needs.discover_versions.outputs.wolfssl_latest_ref_array) }} | ||
| steps: | ||
| - name: Checkout wolfProvider | ||
| uses: actions/checkout@v4 | ||
| with: | ||
| fetch-depth: 1 | ||
|
|
||
| - name: Build wolfProvider | ||
| run: | | ||
| OPENSSL_TAG=${{ matrix.openssl_ref }} WOLFSSL_TAG=${{ matrix.wolfssl_ref }} ./scripts/build-wolfprovider.sh | ||
|
|
||
| - name: Run perf regression | ||
| run: | | ||
| source scripts/env-setup | ||
| OPENSSL_TAG=${{ matrix.openssl_ref }} WOLFSSL_TAG=${{ matrix.wolfssl_ref }} ./scripts/perf_test/do-perf-tests.sh | ||
|
|
||
| - name: Upload results | ||
| if: always() | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: perf-results-nonfips-${{ matrix.wolfssl_ref }}-${{ matrix.openssl_ref }} | ||
| path: perf_outputs/results.json | ||
| retention-days: 7 | ||
|
|
||
| perf_fips: | ||
| needs: discover_versions | ||
| name: Perf regression (FIPS) | ||
| runs-on: ubuntu-22.04 | ||
| timeout-minutes: 30 | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| wolfssl_bundle_ref: [ '5.8.2' ] | ||
| openssl_ref: ${{ fromJson(needs.discover_versions.outputs.openssl_latest_ref_array) }} | ||
| steps: | ||
| - name: Checkout wolfProvider | ||
| uses: actions/checkout@v4 | ||
| with: | ||
| fetch-depth: 1 | ||
|
|
||
| - name: Download FIPS Ready Bundle | ||
| run: | | ||
| BUNDLE_URL="https://www.wolfssl.com/wolfssl-${{ matrix.wolfssl_bundle_ref }}-gplv3-fips-ready.zip" | ||
| wget -O wolfssl-fips-ready.zip "$BUNDLE_URL" | ||
| unzip wolfssl-fips-ready.zip | ||
| BUNDLE_DIR=$(find . -maxdepth 1 -type d -name "*fips-ready*" | head -n 1) | ||
| if [ -z "$BUNDLE_DIR" ]; then | ||
| echo "ERROR: Could not find FIPS ready bundle directory after extraction" | ||
| ls -la | ||
| exit 1 | ||
| fi | ||
| echo "FIPS_BUNDLE_PATH=$(pwd)/$BUNDLE_DIR" >> $GITHUB_ENV | ||
|
|
||
| - name: Build wolfProvider with FIPS Ready Bundle | ||
| run: | | ||
| OPENSSL_TAG=${{ matrix.openssl_ref }} ./scripts/build-wolfprovider.sh --fips-bundle="$FIPS_BUNDLE_PATH" \ | ||
| --fips-check=ready --wolfssl-ver=v${{ matrix.wolfssl_bundle_ref }}-stable | ||
|
|
||
| - name: Run perf regression | ||
| run: | | ||
| export WOLFSSL_ISFIPS=1 | ||
| source scripts/env-setup | ||
| WOLFSSL_TAG=v${{ matrix.wolfssl_bundle_ref }}-stable OPENSSL_TAG=${{ matrix.openssl_ref }} ./scripts/perf_test/do-perf-tests.sh | ||
|
|
||
| - name: Upload results | ||
| if: always() | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: perf-results-fips-${{ matrix.wolfssl_bundle_ref }}-${{ matrix.openssl_ref }} | ||
| path: perf_outputs/results.json | ||
| retention-days: 7 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| #!/bin/bash | ||
| # | ||
| # Copyright (C) 2006-2025 wolfSSL Inc. | ||
| # | ||
| # This file is part of wolfProvider. | ||
| # | ||
| # wolfProvider is free software; you can redistribute it and/or modify | ||
| # it under the terms of the GNU General Public License as published by | ||
| # the Free Software Foundation; either version 3 of the License, or | ||
| # (at your option) any later version. | ||
| # | ||
| # wolfProvider is distributed in the hope that it will be useful, | ||
| # but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
| # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
| # GNU General Public License for more details. | ||
| # | ||
| # You should have received a copy of the GNU General Public License | ||
| # along with wolfProvider. If not, see <http://www.gnu.org/licenses/>. | ||
|
|
||
| if [ -z "${DO_CMD_TESTS:-}" ]; then | ||
| echo "This script is designed to be called from do-perf-tests.sh" | ||
| echo "Do not run this script directly - use do-perf-tests.sh instead" | ||
| exit 1 | ||
| fi | ||
|
|
||
| clean_perf_test() { | ||
| rm -f "./scripts/perf_test/perf-test.log" | ||
| rm -rf "./perf_outputs" | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| #!/bin/bash | ||
| # do-perf-tests.sh | ||
| # Run the wolfProvider performance regression test. | ||
| # | ||
| # Copyright (C) 2006-2025 wolfSSL Inc. | ||
| # | ||
| # This file is part of wolfProvider. | ||
| # | ||
| # wolfProvider is free software; you can redistribute it and/or modify | ||
| # it under the terms of the GNU General Public License as published by | ||
| # the Free Software Foundation; either version 3 of the License, or | ||
| # (at your option) any later version. | ||
| # | ||
| # wolfProvider is distributed in the hope that it will be useful, | ||
| # but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
| # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
| # GNU General Public License for more details. | ||
| # | ||
| # You should have received a copy of the GNU General Public License | ||
| # along with wolfProvider. If not, see <http://www.gnu.org/licenses/>. | ||
|
|
||
| PERF_TEST_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )" | ||
| REPO_ROOT="$( cd "${PERF_TEST_DIR}/../.." &> /dev/null && pwd )" | ||
|
|
||
| export DO_CMD_TESTS=1 | ||
|
|
||
| show_help() { | ||
| cat << EOF | ||
| Usage: $0 [OPTIONS] | ||
|
|
||
| Measure per-invocation cost of representative openssl commands under | ||
| wolfProvider and compare against the committed baseline for the active | ||
| build variant (FIPS vs non-FIPS, selected by WOLFSSL_ISFIPS). | ||
|
|
||
| OPTIONS: | ||
| --help Show this help message | ||
| --update-baseline Regenerate the baseline JSON from this run instead of | ||
| gating against it | ||
|
|
||
| ENVIRONMENT VARIABLES: | ||
| OPENSSL_BIN Path to OpenSSL binary (auto-detected if not set) | ||
| WOLFSSL_ISFIPS Set to 1 to select the FIPS baseline | ||
| PERF_ITER Measured iterations per command (default 15) | ||
| PERF_WARMUP Warmup iterations per command (default 3) | ||
| PERF_CONFIRM Total measurement attempts for a failing command before | ||
| it is reported as a regression (default 3) | ||
| EOF | ||
| exit 0 | ||
| } | ||
|
|
||
| PASS_ARGS=() | ||
| while [[ $# -gt 0 ]]; do | ||
| case $1 in | ||
| --help|-h) | ||
| show_help | ||
| ;; | ||
| --update-baseline) | ||
| PASS_ARGS+=("$1") | ||
| shift | ||
| ;; | ||
| *) | ||
| echo "Unknown option: $1" | ||
| echo "Use --help for usage information" | ||
| exit 1 | ||
| ;; | ||
| esac | ||
| done | ||
|
|
||
| source "${REPO_ROOT}/scripts/cmd_test/cmd-test-common.sh" | ||
| cmd_test_env_setup | ||
|
|
||
| "${PERF_TEST_DIR}/perf-cmd-test.sh" "${PASS_ARGS[@]}" | ||
| exit $? |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| { | ||
| "tolerance": 0.25, | ||
| "commands": { | ||
| "init-probe": { "abs_ms_max": 15 }, | ||
| "version": { "abs_ms_max": 15 }, | ||
| "dgst-sha256": { "ratio_max": 1.8 }, | ||
| "enc-aes": { "ratio_max": 4.9 }, | ||
| "genpkey-rsa": { "ratio_max": 2.8 }, | ||
| "genpkey-ec": { "ratio_max": 3.1 }, | ||
| "pkeyutl-rsa": { "ratio_max": 14.2 }, | ||
| "dh-derive": { "ratio_max": 16.4 } | ||
| } | ||
| } |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| { | ||
| "tolerance": 0.25, | ||
| "commands": { | ||
| "init-probe": { "abs_ms_max": 15 }, | ||
| "version": { "abs_ms_max": 15 }, | ||
| "dgst-sha256": { "ratio_max": 1.6 }, | ||
| "enc-aes": { "ratio_max": 1.5 }, | ||
| "genpkey-rsa": { "ratio_max": 1.8 }, | ||
| "genpkey-ec": { "ratio_max": 1.3 }, | ||
| "pkeyutl-rsa": { "ratio_max": 1.2 }, | ||
| "dh-derive": { "ratio_max": 1.5 } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is testing the overhead, and not the runtime speed, maybe consider a different name. "Overhead timing regression test"? "Load time regression test"?