Add nightly OpenSSL command performance regression testing#411
Add nightly OpenSSL command performance regression testing#411aidangarske wants to merge 1 commit into
Conversation
ec5b0c0 to
94456a0
Compare
94456a0 to
a3501ea
Compare
| above 1.0. | ||
|
|
||
| To keep the nightly from going red on a single noisy measurement, a | ||
| command that fails the gate is measured up to `PERF_CONFIRM` times total |
There was a problem hiding this comment.
Do we need this second layer of confirmation beyond the "N runs" above -- why not just increase the N? Maybe this showed up in your testing?
| use). This job guards the per-invocation cost of that path so a repeat of | ||
| the DH-CAST init blow-up gets caught automatically. | ||
|
|
||
| **This is an overhead regression tripwire, not a crypto throughput |
There was a problem hiding this comment.
If this is testing the overhead, and not the runtime speed, maybe consider a different name. "Overhead timing regression test"? "Load time regression test"?
| default provider serves **only as a per-run baseline to cancel | ||
| runner-speed variance**, and the `overhead` factor (wolfProvider ÷ | ||
| baseline) is checked against a committed budget | ||
| (`scripts/perf_test/perf-baseline.{nonfips,fips}.json`). The init probes |
There was a problem hiding this comment.
Any guidance on when to update the baseline?
| baseline) is checked against a committed budget | ||
| (`scripts/perf_test/perf-baseline.{nonfips,fips}.json`). The init probes | ||
| are gated on absolute ms. The job fails only when a command exceeds its | ||
| budget (× tolerance) — i.e. when overhead *regresses*, never for being |
There was a problem hiding this comment.
Is there a case where we would want to tighten the baseline? Would significant improvement be a sign to re-baseline? Not sure if we'd want to flag an error if the PR is significantly better
| timed under both the OpenSSL default provider and wolfProvider; the | ||
| default provider serves **only as a per-run baseline to cancel | ||
| runner-speed variance**, and the `overhead` factor (wolfProvider ÷ | ||
| baseline) is checked against a committed budget |
There was a problem hiding this comment.
The variance in machine/OS add extra variables between runs iiuc. Would it make sense to compare against master in a single job instead of a hardcoded baseline?
Description
Add command performance testing and regression validation