DRAFT: Add Zstd GPU CI test infrastructure (Phase 1)#108
Conversation
89bc1c9 to
9a677ac
Compare
Add zstdgpu_ci_tests: a thin GTest wrapper that shells out to zstdgpu_demo.exe for correctness and performance validation of GPU Zstd decompression. Tests are parameterized over .zst content files discovered at runtime via --content-path. Test cases: - SimulationCheck (--chk-gpu --chk-cpu --sim-gpu) - D3D12DebugLayer (--d3d-dbg) - ExternalMemory (--ext-mem) - GraphicsQueue (--d3d-gfx) - OverallThroughput (--prf-lvl 0) - PerStageTiming (--prf-lvl 2) Also adds scripts/generate_histogram.py for converting performance CSV output into histogram PNGs suitable for CI reporting. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ectness failure Adds a translation-unit-only override of the ZSTDGPU_BREAK macro for the demo's CPU-sim shader path: instead of the library's __debugbreak() -- which kills the demo with STATUS_BREAKPOINT (0x80000003) on bad input before any [FAIL] message can flush -- breaks now log a [ZGBRK] line and increment a correctness-failure counter. VALIDATE / VALIDATE_CND also increment the counter, the per-frame validation failure block and uncompressed-size early-out increment it, and wmain returns 2 if the counter is non-zero. The validate-on-CPU path early-returns before DecompressSequences if any prior validation failed -- avoids feeding sentinel FSE table indices into the OOB-prone shader code. All changes are local to zstdgpu_demo/main.cpp. zstdgpu library is untouched. Performance CSV emission is provided upstream by PR microsoft#118 (zstdgpu_demo's built-in --out-csv flag); this PR no longer adds a separate CSV writer.
5115e9b to
1456908
Compare
|
|
||
| // File discovery | ||
|
|
||
| std::vector<std::string> DiscoverZstFiles(const std::string& contentPath) |
There was a problem hiding this comment.
recursive_directory_iterator(contentPath) uses the throwing overload. The guard above only checks the root path, it doesn't cover errors hit during the walk. So if recursion hits a subdirectory it can't read, it throws filesystem_error, and since nothing catches it (here in main() or in the INSTANTIATE_TEST_SUITE_P generator), the whole process aborts.
The effect: one unreadable subfolder crashes the entire run before any test executes. Every .zst goes untested, and you get a crash instead of a named test failure. Is this worth handling?
| } | ||
| } | ||
|
|
||
| if (config.demoPath.empty()) |
There was a problem hiding this comment.
Should this be changed to an .exist() and throw and error instead of a warning? If the change is made the GTEST_SKIP() << "zstdgpu_demo.exe not found. Set --demo-path."; can be removed. Should the remaining warnings also be changed to failures in main()?
|
|
||
| // Singleton access — SetTestConfig called once from main(), GetTestConfig | ||
| // called from test helpers. | ||
| // Spec implies a global but doesn't show accessor pattern. |
There was a problem hiding this comment.
Nit: the spec suggested this pattern, not a prescription, global isn't strictly necessary. If this works better as a parameter for the test fixture that is also fine.
| const std::string& csvOutputPath); | ||
|
|
||
| // File discovery — scans directories for .zst test content. | ||
|
|
There was a problem hiding this comment.
Nit: remove extra line.
| int timeoutSeconds = 0); | ||
|
|
||
| // Convenience: builds the full argument list for a correctness scenario. | ||
| // Spec inlines the args in each test. Extracting them avoids repeating --zst, --run-cnt 1, etc |
There was a problem hiding this comment.
Nit: can remove reference to spec.
| { | ||
| const auto& config = GetTestConfig(); | ||
|
|
||
| if (config.demoPath.empty()) |
There was a problem hiding this comment.
I think I left a comment in main(), but shouldn't this be removed. We can just fail in main() if the demoPath doesn't exist().
| CloseHandle(pi.hThread); | ||
|
|
||
| // Wait for the reader thread to finish draining the pipe, then clean up. | ||
| readerThread.join(); |
There was a problem hiding this comment.
join()'s wait is unbounded, if TerminateProcess() in line 367 doesn't kill the process immediately. This could hang the test runner indefinitely. This may or may not be serious, depending on what type of failures we get when testing.
| << "Demo process returned non-zero exit code: " << result.exitCode << "\n" | ||
| << "Command: " << result.commandLine << "\n" | ||
| << "Output:\n" | ||
| << result.stdOut; |
There was a problem hiding this comment.
Why is the result.stdOut printed out twice when the demo app returns 0? Its already printed in line 147.
| << "Demo process returned non-zero exit code: " << result.exitCode << "\n" | ||
| << "Command: " << result.commandLine << "\n" | ||
| << "Output:\n" | ||
| << result.stdOut; |
There was a problem hiding this comment.
Repeat comment from above regarding result.stdOut being printed out twice for demo runs returning 0.
| EndProject | ||
| Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "googletest_static", "ThirdParty\googletest_static.vcxproj", "{49811F10-3D14-403E-859D-40DFCBB35C7B}" | ||
| EndProject | ||
| Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "zstdgpu_ci_tests", "zstdgpu_ci_tests\zstdgpu_ci_tests.vcxproj", "{A1B2C3D4-E5F6-7890-ABCD-EF1234567890}" |
Draft - not ready for merge. Putting this up for early visibility and code review.
Adds a thin GTest wrapper (
zstdgpu_ci_tests) that shells out tozstdgpu_demo.exeto validate Zstd GPU decompression shaders in CI. Also adds pipeline YAML and a histogram generation script for perf results.What's here
zstd/zstdgpu_ci_tests/— GTest parameterized test fileszstd/scripts/generate_histogram.py— Generates histogram PNGs from perf CSV output. This can probably be improvedOpen Q's
What's not here
zstdgpu_demo.exe. ATG owns. This PR does not include--output-csvflag and non-zero exit code changesATG dependencies
--output-csvflag onzstdgpu_demo.exefor perf CSV outputzstdgpu_demo.exewhen validation fails (correctness tests currently pass vacuously)