Problem
evaluation/sil/policy_runner.py only dispatches ACT policies. With pi0 training landing (#916), there is no SIL eval path for pi0 checkpoints. Additionally, P1 (#927) introduces VLA evaluation schema v1 emission in run_evaluation.py, which needs to be reconciled with the existing toolchain metrics schema for consistency across policies.
Proposed solution
- Add pi0 / pi0_fast dispatch branch to
policy_runner.py
- Add a pi0 eval submit script + workflow YAML
- Reconcile VLA schema v1 fields with the toolchain's existing eval output so both ACT and pi0 evals emit the same shape
Acceptance criteria
Dependencies
Estimate
3–4 person-days
Problem
evaluation/sil/policy_runner.pyonly dispatches ACT policies. With pi0 training landing (#916), there is no SIL eval path for pi0 checkpoints. Additionally, P1 (#927) introduces VLA evaluation schema v1 emission inrun_evaluation.py, which needs to be reconciled with the existing toolchain metrics schema for consistency across policies.Proposed solution
policy_runner.pyAcceptance criteria
policy_runner.pyaccepts pi0 checkpoints and runs SIL rolloutsDependencies
Estimate
3–4 person-days