Skip to content

Port Protein to CUDA C#601

Open
vyeoms wants to merge 12 commits into
PufferAI:4.0from
vyeoms:port/c_protein
Open

Port Protein to CUDA C#601
vyeoms wants to merge 12 commits into
PufferAI:4.0from
vyeoms:port/c_protein

Conversation

@vyeoms

@vyeoms vyeoms commented Jun 29, 2026

Copy link
Copy Markdown

Summary

Building on top of #587, so the diff will be updated once the GP port PR is accepted. Porting Protein to pure CUDA C:

  • src/protein_util.h: Pure C utilities for Protein. Defines the search space type (linear, log, pow2, logit), normalizations, Pareto front utilities, and other numeric helpers (for example the Nelder Mead minimizer used here).
  • src/protein.cu: Core implementation for Protein, implements the original as faithfully as I could make it. Has code for reqs like Adam, acquisition scoring, and other device-side numeric operations (e.g. for the classifier), which I don't know if would be preferred in a separate file. Currently ~1200 LoC.
  • tests/test_protein.cu: Unit testing the components from src/protein.cu. Build with nvcc -o test_protein tests/test_protein.cu -I src/ -lcublas -lcusolver -lcurand and run with ./test_protein.
  • tests/test_protein_sweep.cu: Replicates the synthetic sweep test from tests/test_sweep.py. Outputs an HTML with the plot, and a CSV with the results for registry. Build with nvcc -o test_protein_sweep tests/test_protein_sweep.cu -I src/ -lcublas -lcusolver -lcurand and run with ./test_protein_sweep.

Notes

Fair to note, this is a pure CUDA implementation. Unlike the GP port I mentioned in #587, I don't have a pure C CPU version for Protein currently.

This should build natively with PufferLib to run with puffer sweep <env>, but falls back to the original python implementation in case the Protein CUDA build isn't available.

Numeric and qualitative results

Unit testing

tests/test_protein.cu generally tests the following aspects:

  • Do we get the actual Pareto-dominant points?
  • Does the Pareto pruning actually prune bad results and keep the good ones?
  • Do the cost models converge to the true cost in a toy setting?
  • Is the logistic regression classifier good enough? Tests on a set of linearly separable points.
  • Are the samples from the acquisition score within bounds?

Plus a small integration test fitting a toy cost function.

Synthetic test

tests/test_protein_sweep.cu replicates the synthetic eval from tests/test_sweep.py. Output from the CUDA C test:

image

Original implementation with gpytorch:

image

Breakout

Ran a 20 iteration sweep on Breakout. Sweep result on my laptop with one A100 GPU:

image

Playing with the top score hyperparameters:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant