TwoStageDiD methodology validation (PR-B): did2s parity + exact GMM variance#545
Conversation
Overall AssessmentExecutive Summary
MethodologyFinding 1Severity: P1 Code QualityNo additional P1/P2 code-quality findings in the changed implementation. The new exact-residual code is localized and follows the existing sparse-factorization plus warned dense-fallback pattern. PerformanceNo blocking findings. The exact Stage-1 re-solve reuses the existing sparse factorization, so the added work is a small extra RHS solve plus a Stage-2 solve. MaintainabilitySeverity: P3 Tech DebtNo untracked deferrable tech-debt blocker found. The paper-permitted but unexposed Eq. (5) / fn. 8 variants are tracked in SecurityNo security findings. The added benchmark CSV/JSON fixture appears synthetic and contains no secrets or PII. Documentation/TestsSeverity: P2 Path to Approval
|
2acfbe0 to
c5602bb
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
MethodologyFinding M1Severity: P3 informational Code QualityNo findings. The exact residual construction is now centralized, and the changed inference paths continue to use the existing NaN-safe inference conventions. PerformanceNo findings. The added exact Stage-1 solve reuses the same Stage-1 sparse factorization already needed for MaintainabilityNo findings. Sharing Tech DebtFinding T1Severity: P3 informational SecurityNo findings. The added benchmark CSV/JSON fixture appears synthetic and contains no secrets or PII. Documentation/TestsFinding D1Severity: P3 informational Verification: Not run. This environment lacks |
…ariance PR-B of the TwoStageDiD (Gardner 2022) methodology validation. Adds paper-grounded Verified Components and a did2s cross-language parity fixture, and corrects a ~1% inaccuracy in the GMM sandwich variance surfaced by the parity check. Tests / docs: - tests/test_methodology_two_stage.py: §3 procedure (eqs. 4/6), §3.3 GMM variance, fn.19 always-treated + Prop.5, the live covariate path, balance_e, vcov_type narrowing, a skip-guarded did2s R parity class, and a white-box test that the bootstrap GMM scores use the exact residual path. - benchmarks/R/generate_did2s_golden.R + did2s_golden.json + did2s_test_panel.csv: did2s::did2s() golden (analytical corrected clustered SE, bootstrap=FALSE). - REGISTRY / paper-review / METHODOLOGY_REVIEW / CHANGELOG / TODO updated; tracker row flipped to Complete. Variance fix (diff_diff/two_stage.py, diff_diff/two_stage_bootstrap.py): - The GMM sandwich derived its residuals from the iterative alternating-projection first-stage FE (_iterative_fe, ~1e-7 on unbalanced untreated panels) while computing gamma_hat exactly, leaving the SE ~1% off the analytical sandwich. Both the analytical variance (_compute_gmm_variance) and the multiplier bootstrap (_compute_cluster_S_scores) now re-solve the Stage-1 FE EXACTLY via a shared _exact_gmm_residuals helper (reusing the gamma_hat factorization), so the GMM influence function is single-sourced. This matters for the bootstrap because its SE overrides the analytical SE when n_bootstrap > 0. - _build_fe_design gains an intercept column so its column space spans the grand mean (the prior intercept-free design omitted it; the exact residual is first-order sensitive). Standard full-rank two-way FE, matching fixest / did2s. - Point estimate unchanged (iterative FE; ImputationDiD equivalence preserved at 1e-10); only the variance uses exact residuals. Unidentified-FE obs (rank- deficient / Prop.5) fall back to the iterative residual. SE now matches did2s to ~1e-7. Mirrors ImputationDiD's exact-sparse variance fix. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
c5602bb to
278e541
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
MethodologyFinding M1Severity: P3 informational Code QualityNo findings. The exact residual construction is centralized and reused by analytical variance and bootstrap paths, reducing the prior divergence. PerformanceNo findings. The exact Stage-1 residual solve reuses the same sparse factorization needed for MaintainabilityNo findings. The changed signatures propagated through all visible callers in the diff, and the previous optional event-study bootstrap coverage gap is now explicitly covered. Tech DebtFinding T1Severity: P3 informational SecurityNo findings. The added CSV/JSON benchmark fixture appears synthetic and contains no secrets or PII. Documentation/TestsFinding D1Severity: P3 informational Verification
|
Summary
did2sreview pair. Adds paper-grounded Verified Components and adid2scross-language parity fixture, and flips theMETHODOLOGY_REVIEW.mdTwoStageDiDrow to Complete.did2sparity. The variance computedgamma_hatexactly (sparse) but derived its residuals from the iterative alternating-projection first-stage FE (_iterative_fe, ~1e-7 convergence on unbalanced untreated panels). It now re-solves the Stage-1 FE exactly (reusing thegamma_hatfactorization), and_build_fe_designgains an intercept column so its column space spans the grand mean (the prior intercept-free design omitted it; the exact residual is first-order sensitive). SE now matchesdid2sto ~1e-7; the point estimate is unchanged (iterative FE;ImputationDiDequivalence preserved at 1e-10). Mirrors the same-class fix inImputationDiD's exact-sparse variance.Methodology references
did2s::did2s()(Butts & Gardner)did2s; documented as Deviation from R inREGISTRY.md). The multiplier bootstrap and thevcov_typenarrowing are library extensions (Gardner prescribes analytical GMM SEs only;did2sdefaultsbootstrap=FALSE). The Eq. (5) P̄-average estimand and the fn. 8 full-sample first-stage variant are paper-permitted but not exposed (tracked inTODO.md).Validation
tests/test_methodology_two_stage.py(new — five Gardner-section Verified Component classes: §3 procedure/eqs. 4&6, §3.3 GMM variance, fn. 19 + Proposition 5 identification, library deviations, plusTestTwoStageDiDParityR);benchmarks/R/generate_did2s_golden.R+benchmarks/data/did2s_golden.json+did2s_test_panel.csv(new R parity fixture, committed so CI needs no R).did2sparity: overall + event-study ATT (abs=1e-6) and SE (abs=1e-7). Regression: fulltests/test_two_stage.py(120) plus a 2315-test sweep across all TwoStageDiD/survey test files — green.black/ruffclean.Security / privacy
🤖 Generated with Claude Code