Skip to content

DRAFT: Add Zstd GPU CI test infrastructure (Phase 1)#108

Open
brohan203 wants to merge 3 commits into
microsoft:developmentfrom
brohan203:users/rohanborkar/ds_testInfra_phase1
Open

DRAFT: Add Zstd GPU CI test infrastructure (Phase 1)#108
brohan203 wants to merge 3 commits into
microsoft:developmentfrom
brohan203:users/rohanborkar/ds_testInfra_phase1

Conversation

@brohan203

Copy link
Copy Markdown

Draft - not ready for merge. Putting this up for early visibility and code review.

Adds a thin GTest wrapper (zstdgpu_ci_tests) that shells out to zstdgpu_demo.exe to validate Zstd GPU decompression shaders in CI. Also adds pipeline YAML and a histogram generation script for perf results.

What's here

  • zstd/zstdgpu_ci_tests/ — GTest parameterized test files
    • main.cpp handles setup (CLI parsing, config, file discovery), and zstdgpu_ci_tests.cpp handles executions (test definitions, demo runner)
    • 4 correctness scenarios: SimulationCheck, D3D12DebugLayer, ExternalMemory, GraphicsQueue
    • 2 performance scenarios: OverallThroughput, PerStageTiming
    • Win32 CreateProcess runner with pipe capture and optional timeout
  • zstd/scripts/generate_histogram.py — Generates histogram PNGs from perf CSV output. This can probably be improved

Open Q's

  • Do we need x86?
  • There's a decent amount of google test infra in the zstd/zstdgpu_samples folder. Need to look into reusing that instead of new methods I wrote

What's not here

  • Changes to zstdgpu_demo.exe. ATG owns. This PR does not include --output-csv flag and non-zero exit code changes
  • Test data staging on lab machines
  • End-to-end pipeline validation

ATG dependencies

  • --output-csv flag on zstdgpu_demo.exe for perf CSV output
  • Non-zero exit code from zstdgpu_demo.exe when validation fails (correctness tests currently pass vacuously)

@brohan203 brohan203 force-pushed the users/rohanborkar/ds_testInfra_phase1 branch from 89bc1c9 to 9a677ac Compare June 24, 2026 15:11
Rohan Borkar and others added 3 commits June 25, 2026 12:00
Add zstdgpu_ci_tests: a thin GTest wrapper that shells out to
zstdgpu_demo.exe for correctness and performance validation of GPU
Zstd decompression. Tests are parameterized over .zst content files
discovered at runtime via --content-path.

Test cases:
- SimulationCheck (--chk-gpu --chk-cpu --sim-gpu)
- D3D12DebugLayer (--d3d-dbg)
- ExternalMemory (--ext-mem)
- GraphicsQueue (--d3d-gfx)
- OverallThroughput (--prf-lvl 0)
- PerStageTiming (--prf-lvl 2)

Also adds scripts/generate_histogram.py for converting performance
CSV output into histogram PNGs suitable for CI reporting.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…ectness failure

Adds a translation-unit-only override of the ZSTDGPU_BREAK macro for the
demo's CPU-sim shader path: instead of the library's __debugbreak() -- which
kills the demo with STATUS_BREAKPOINT (0x80000003) on bad input before any
[FAIL] message can flush -- breaks now log a [ZGBRK] line and increment a
correctness-failure counter.

VALIDATE / VALIDATE_CND also increment the counter, the per-frame validation
failure block and uncompressed-size early-out increment it, and wmain returns
2 if the counter is non-zero. The validate-on-CPU path early-returns before
DecompressSequences if any prior validation failed -- avoids feeding sentinel
FSE table indices into the OOB-prone shader code.

All changes are local to zstdgpu_demo/main.cpp. zstdgpu library is untouched.
Performance CSV emission is provided upstream by PR microsoft#118 (zstdgpu_demo's
built-in --out-csv flag); this PR no longer adds a separate CSV writer.
@brohan203 brohan203 force-pushed the users/rohanborkar/ds_testInfra_phase1 branch from 5115e9b to 1456908 Compare June 25, 2026 19:30

// File discovery

std::vector<std::string> DiscoverZstFiles(const std::string& contentPath)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recursive_directory_iterator(contentPath) uses the throwing overload. The guard above only checks the root path, it doesn't cover errors hit during the walk. So if recursion hits a subdirectory it can't read, it throws filesystem_error, and since nothing catches it (here in main() or in the INSTANTIATE_TEST_SUITE_P generator), the whole process aborts.

The effect: one unreadable subfolder crashes the entire run before any test executes. Every  .zst  goes untested, and you get a crash instead of a named test failure. Is this worth handling?

}
}

if (config.demoPath.empty())

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be changed to an .exist() and throw and error instead of a warning? If the change is made the GTEST_SKIP() << "zstdgpu_demo.exe not found. Set --demo-path."; can be removed. Should the remaining warnings also be changed to failures in main()?


// Singleton access — SetTestConfig called once from main(), GetTestConfig
// called from test helpers.
// Spec implies a global but doesn't show accessor pattern.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the spec suggested this pattern, not a prescription, global isn't strictly necessary. If this works better as a parameter for the test fixture that is also fine.

const std::string& csvOutputPath);

// File discovery — scans directories for .zst test content.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: remove extra line.

int timeoutSeconds = 0);

// Convenience: builds the full argument list for a correctness scenario.
// Spec inlines the args in each test. Extracting them avoids repeating --zst, --run-cnt 1, etc

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can remove reference to spec.

{
const auto& config = GetTestConfig();

if (config.demoPath.empty())

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I left a comment in main(), but shouldn't this be removed. We can just fail in main() if the demoPath doesn't exist().

CloseHandle(pi.hThread);

// Wait for the reader thread to finish draining the pipe, then clean up.
readerThread.join();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

join()'s wait is unbounded, if TerminateProcess() in line 367 doesn't kill the process immediately. This could hang the test runner indefinitely. This may or may not be serious, depending on what type of failures we get when testing.

<< "Demo process returned non-zero exit code: " << result.exitCode << "\n"
<< "Command: " << result.commandLine << "\n"
<< "Output:\n"
<< result.stdOut;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the result.stdOut printed out twice when the demo app returns 0? Its already printed in line 147.

<< "Demo process returned non-zero exit code: " << result.exitCode << "\n"
<< "Command: " << result.commandLine << "\n"
<< "Output:\n"
<< result.stdOut;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeat comment from above regarding result.stdOut being printed out twice for demo runs returning 0.

Comment thread zstd/zstd.sln
EndProject
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "googletest_static", "ThirdParty\googletest_static.vcxproj", "{49811F10-3D14-403E-859D-40DFCBB35C7B}"
EndProject
Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "zstdgpu_ci_tests", "zstdgpu_ci_tests\zstdgpu_ci_tests.vcxproj", "{A1B2C3D4-E5F6-7890-ABCD-EF1234567890}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

real GUID?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants