Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Fixed
- **`LinearRegression.get_se()` / `get_inference()` no longer return a `NaN` standard error from a tiny-negative variance artifact.** A high-leverage / degenerate coefficient (e.g. an absorbed-FE dummy near-collinear with the treatment, whose Bell-McCaffrey Satterthwaite DOF already hits the noise-floor guard) can have a CR2/HC variance of ~0 (≈1e-32) whose vcov diagonal lands just-below-zero under BLAS-dependent float rounding; `np.sqrt` of the negative then produced a `NaN` SE **nondeterministically** — passing single-threaded but failing under the parallel pure-Python full-suite run (`tests/test_methodology_wls_cr2.py::TestLinearRegressionFENanGuardEndToEnd::test_did_absorbed_fe_lr_inference_nan_for_guarded_coefs`). Both SE sites now clamp the vcov diagonal at 0, so the SE is finite (0 for a genuinely-zero variance), deterministic, and BLAS-independent. **No change for any positive variance** (the clamp is a no-op there); only the previously-`NaN` degenerate case is affected.
- **`TripleDifference` power analysis now honors `n_periods > 2`.** `simulate_power`,
`simulate_mde`, and `simulate_sample_size` previously routed DDD to the
cross-sectional 2×2×2 `generate_ddd_data` regardless of `n_periods` (emitting an
"n_periods ignored" warning). They now route to the panel DGP
`generate_ddd_panel_data` when `n_periods > 2`, honoring `n_periods`/`treatment_period`
and sizing the panel by `n_units` directly (the sample-size search switches from the
multiple-of-8 grid to a continuous step-1 search). Because `simulate_power` defaults
to `n_periods=4`, the default DDD power call now uses the panel DGP. The panel DGP has
within-unit serial correlation, so construct the estimator as
`TripleDifference(cluster="unit")` for valid power — a `UserWarning` fires otherwise.
`treatment_fraction` remains inert (balanced 2×2×2); pass `group_frac`/`partition_frac`
via `data_generator_kwargs`. See `docs/methodology/REGISTRY.md` §PowerAnalysis.

## [3.5.2] - 2026-06-08

Expand Down
1 change: 0 additions & 1 deletion TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,6 @@ Deferred items from PR reviews that were not addressed before merge.
| Survey sandwich SE is not exactly invariant to zero-weight (subpopulation / padded) rows: the shared `_compute_stratified_psu_meat` finite-sample correction counts zero-weight units as PSUs (an `n_psu/(n_psu-1)`-style factor), so adding zero-weight rows shifts the SE by a second-order amount (~2e-4 relative in the EfficientDiD e2e). The point estimate is exactly invariant and the weighted scores of zero-weight rows are already zero — only the DOF correction's PSU count includes them. Cross-cutting across all survey-enabled estimators; fix by counting only positive-weight PSUs in the correction. | `survey.py` (`_compute_stratified_psu_meat`) | PR-B follow-up | Low |
| ImputationDiD: leave-one-out (LOO) conservative-variance refinement (BJS 2024 Supplementary Appendix A.9) not implemented — a finite-sample improvement to the auxiliary-model residuals that reduces overfitting of `tau_tilde_g` to `epsilon`. The asymptotic Theorem-3 variance is implemented and matches R `didimputation` (which also omits LOO by default). | `imputation.py` | imputation-validation follow-up | Low |
| TROP: extend Wave 4's `_setup_trop_data` helper to also cover the duplicated bootstrap resampling loop in `_bootstrap_variance` / `_bootstrap_variance_global` (~40 LoC dedup; mirrors the data-setup helper pattern with a `fit_callable` parameter for the per-draw refit step). | `trop_local.py`, `trop_global.py` | follow-up | Low |
| TripleDifference power auto-routing: `power.simulate_power` ignores `n_periods` for DDD because `_ddd_dgp_kwargs` is hard-coded to the cross-sectional `generate_ddd_data`. Now that `generate_ddd_panel_data` exists (Wave 4), add a new `_EstimatorProfile` registry entry (or extend the existing one) to route to the panel DGP when `n_periods > 2`. | `power.py`, `prep_dgp.py` | follow-up | Low |
| StaggeredTripleDifference R cross-validation: CSV fixtures not committed (gitignored); tests skip without local R + triplediff. Commit fixtures or generate deterministically. | `tests/test_methodology_staggered_triple_diff.py` | #245 | Medium |
| StaggeredTripleDifference R parity: benchmark only tests no-covariate path (xformla=~1). Add covariate-adjusted scenarios and aggregation SE parity assertions. | `benchmarks/R/benchmark_staggered_triplediff.R` | #245 | Medium |
| StaggeredTripleDifference: per-cohort group-effect SEs include WIF (conservative vs R's wif=NULL). Documented in REGISTRY. Could override mixin for exact R match. | `staggered_triple_diff.py` | #245 | Low |
Expand Down
223 changes: 214 additions & 9 deletions diff_diff/power.py
Original file line number Diff line number Diff line change
Expand Up @@ -267,6 +267,29 @@ def _ddd_dgp_kwargs(
)


def _ddd_panel_dgp_kwargs(
n_units: int,
n_periods: int,
treatment_effect: float,
treatment_fraction: float,
treatment_period: int,
sigma: float,
) -> Dict[str, Any]:
# Panel DDD DGP (n_periods > 2). `n_units` maps directly (no 8-cell //8
# rounding). `treatment_fraction` is intentionally NOT mapped — DDD is a
# balanced factorial, so group_frac/partition_frac are left at the DGP
# default 0.5. Omitting group_frac/partition_frac here also keeps them out
# of the _PROTECTED_DGP_KEYS collision check, so a user can still override
# the group/partition split via data_generator_kwargs.
return dict(
n_units=n_units,
n_periods=n_periods,
treatment_period=treatment_period,
treatment_effect=treatment_effect,
noise_sd=sigma,
)


# -- Fit kwargs builders ------------------------------------------------------


Expand Down Expand Up @@ -320,6 +343,19 @@ def _ddd_fit_kwargs(
return dict(outcome="outcome", group="group", partition="partition", time="time")


def _ddd_panel_fit_kwargs(
data: pd.DataFrame,
n_units: int,
n_periods: int,
treatment_period: int,
) -> Dict[str, Any]:
# Panel DDD: time="post" is generate_ddd_panel_data's derived binary
# pre/post indicator (vs the cross-sectional "time"). Clustering is NOT a
# fit kwarg — it resolves from the estimator's cluster="unit" attribute
# against the DGP's "unit" column.
return dict(outcome="outcome", group="group", partition="partition", time="post")


def _trop_fit_kwargs(
data: pd.DataFrame,
n_units: int,
Expand Down Expand Up @@ -636,6 +672,47 @@ def _ddd_effective_n(
return eff if eff != n_units else None


def _ddd_panel_cells_populated(n: int, group_frac: float, partition_frac: float) -> bool:
"""Whether ``generate_ddd_panel_data`` would populate all 4 (group,partition)
cells at ``n`` units. Mirrors the rounded stratified allocation + non-empty
validation in ``prep_dgp.generate_ddd_panel_data`` (kept in lockstep)."""
if n < 4:
return False
n_g1 = int(round(n * group_frac))
n_g0 = n - n_g1
n_p1_g0 = int(round(n_g0 * partition_frac))
n_p1_g1 = int(round(n_g1 * partition_frac))
cells = (n_g0 - n_p1_g0, n_p1_g0, n_g1 - n_p1_g1, n_p1_g1)
return min(cells) >= 1


def _ddd_panel_viable_min_n(
group_frac: float, partition_frac: float, floor: int = 16, search_max: int = 100000
) -> int:
"""Smallest ``n_units`` for which ``generate_ddd_panel_data`` populates all
four (group,partition) cells under the given split, floored at ``floor``.

For the balanced default (0.5/0.5) this is 4, so the result is ``floor``;
skewed splits (e.g. 0.1/0.1) need more units before every cell is non-empty,
so the sample-size search must bracket above this value (the registry
documents group_frac/partition_frac as data_generator_kwargs overrides)."""
# Validate up front (matching generate_ddd_panel_data) so an out-of-range
# split raises the same clear message here — before the bracketing logic can
# surface a misleading "n_range below the minimum" error downstream.
if not (0.0 < group_frac < 1.0):
raise ValueError(f"group_frac must be in (0, 1); got {group_frac}.")
if not (0.0 < partition_frac < 1.0):
raise ValueError(f"partition_frac must be in (0, 1); got {partition_frac}.")
for n in range(4, search_max + 1):
if _ddd_panel_cells_populated(n, group_frac, partition_frac):
return max(floor, n)
raise ValueError(
f"No panel-DDD sample size <= {search_max} populates all four "
f"(group, partition) cells for group_frac={group_frac}, "
f"partition_frac={partition_frac}; move the split closer to 0.5."
)


def _check_ddd_dgp_compat(
n_units: int,
n_periods: int,
Expand Down Expand Up @@ -686,6 +763,55 @@ def _check_ddd_dgp_compat(
)


def _check_ddd_panel_dgp_compat(
estimator: Any,
treatment_fraction: float,
data_generator_kwargs: Optional[Dict[str, Any]],
) -> None:
"""Compat checks for the panel DDD power path (``n_periods > 2``).

Unlike the cross-sectional ``_check_ddd_dgp_compat``, ``n_periods`` and
``treatment_period`` are honored here (no warning). ``treatment_fraction``
is still inert (the panel DGP is a balanced 2×2×2). The key addition is the
clustering caveat: ``generate_ddd_panel_data`` has within-unit serial
correlation, so unclustered SEs overstate power.
"""
overrides = data_generator_kwargs or {}
if "n_per_cell" in overrides:
raise ValueError(
"data_generator_kwargs contains 'n_per_cell', a cross-sectional "
"generate_ddd_data parameter. The panel DDD power path "
"(n_periods > 2) uses generate_ddd_panel_data, which sizes the panel "
"by n_units directly. Control the design via n_units and the "
"group_frac / partition_frac data_generator_kwargs instead."
)

if treatment_fraction != 0.5:
warnings.warn(
f"treatment_fraction={treatment_fraction} is ignored for "
f"TripleDifference power: generate_ddd_panel_data uses a balanced "
f"2×2×2 design (group_frac=partition_frac=0.5). Pass group_frac / "
f"partition_frac via data_generator_kwargs to vary the split.",
UserWarning,
stacklevel=2,
)

# The panel DGP hard-names its unit column "unit", and TripleDifference
# resolves clustering from `self.cluster` against a same-named data column,
# so on this auto-DGP path the only correct value is the literal "unit" —
# this is NOT a general clustering check.
if getattr(estimator, "cluster", None) != "unit":
warnings.warn(
"TripleDifference power on the panel DGP (n_periods > 2) has "
"within-unit serial correlation, so unclustered standard errors are "
"anti-conservative and overstate power. Construct the estimator as "
'TripleDifference(cluster="unit") so the reported power reflects '
"cluster-robust (Liang-Zeger CR1) inference.",
UserWarning,
stacklevel=2,
)


def _check_sdid_placebo_data(
data: pd.DataFrame,
estimator: Any,
Expand Down Expand Up @@ -833,6 +959,28 @@ def _get_registry() -> Dict[str, _EstimatorProfile]:
return _ESTIMATOR_REGISTRY


def _ddd_panel_profile() -> "_EstimatorProfile":
"""Profile for panel DDD power simulations (n_periods > 2).

Routes TripleDifference power analysis to ``generate_ddd_panel_data``
(which honors ``n_periods``/``treatment_period``) instead of the
cross-sectional 2x2x2 ``generate_ddd_data``. See ``simulate_power``.
"""
from diff_diff.prep import generate_ddd_panel_data

# min_n=16 is intentionally lower than the cross-sectional DDD min_n=64
# (which counts 8 cells x n_per_cell); the panel DGP maps n_units directly
# and only requires n_units >= 4, so 16 gives >=1 unit per (group,partition)
# cell with margin.
return _EstimatorProfile(
default_dgp=generate_ddd_panel_data,
dgp_kwargs_builder=_ddd_panel_dgp_kwargs,
fit_kwargs_builder=_ddd_panel_fit_kwargs,
result_extractor=_extract_simple,
min_n=16,
)


@dataclass
class PowerResults:
"""
Expand Down Expand Up @@ -2007,6 +2155,21 @@ def simulate_power(
use_custom_dgp = data_generator is not None
use_survey_dgp = survey_config is not None

# Route DDD power to the panel DGP when n_periods > 2. The cross-sectional
# 2x2x2 generate_ddd_data ignores n_periods; generate_ddd_panel_data honors
# it. Swapping the profile here (before the collision check and the DDD
# compat warnings) makes every downstream consumer (dgp_kwargs_builder,
# default_dgp, fit_kwargs_builder, result_extractor, min_n) use the panel
# variant automatically.
use_ddd_panel = (
estimator_name == "TripleDifference"
and n_periods > 2
and not use_custom_dgp
and not use_survey_dgp
)
if use_ddd_panel:
profile = _ddd_panel_profile()

# --- Survey config validation ---
if use_survey_dgp:
assert survey_config is not None # for type narrowing
Expand Down Expand Up @@ -2161,7 +2324,13 @@ def simulate_power(
)

# Warn if DDD design inputs are silently ignored
if estimator_name == "TripleDifference" and not use_custom_dgp:
if use_ddd_panel:
# Panel path honors n_periods/treatment_period; n_units maps directly
# (no //8 rounding). Different compat surface than the cross-sectional
# 2x2x2 design.
_check_ddd_panel_dgp_compat(estimator, treatment_fraction, data_generator_kwargs)
effective_n_units = None
elif estimator_name == "TripleDifference" and not use_custom_dgp:
_check_ddd_dgp_compat(
n_units,
n_periods,
Expand Down Expand Up @@ -2707,8 +2876,10 @@ def simulate_mde(
estimator_name = type(estimator).__name__
search_path: List[Dict[str, float]] = []

# Compute effective N for DDD (N is fixed throughout MDE search)
if estimator_name == "TripleDifference" and data_generator is None:
# Compute effective N for DDD (N is fixed throughout MDE search). Only the
# cross-sectional 2x2x2 DGP (n_periods <= 2) rounds n_units to a multiple of
# 8; the panel DGP (n_periods > 2) maps n_units directly, so report None.
if estimator_name == "TripleDifference" and data_generator is None and n_periods <= 2:
effective_n_units = _ddd_effective_n(n_units, data_generator_kwargs)
else:
effective_n_units = None
Expand Down Expand Up @@ -2925,15 +3096,33 @@ def simulate_sample_size(
estimator_name = type(estimator).__name__
search_path: List[Dict[str, float]] = []

# Determine min_n from registry
# Determine min_n from registry. DDD splits cross-sectional (n_periods <= 2,
# 2x2x2 factorial) from panel (n_periods > 2, generate_ddd_panel_data).
registry = _get_registry()
profile = registry.get(estimator_name)
min_n = profile.min_n if profile is not None else 20
is_ddd = estimator_name == "TripleDifference" and data_generator is None
is_ddd_panel = is_ddd and n_periods > 2
if is_ddd_panel:
# The panel DGP requires every (group,partition) cell non-empty, which
# for a skewed group_frac/partition_frac override needs more than the
# default 16-unit floor. Bracket above the viable minimum so the search
# never probes an infeasible n (which would raise in the DGP).
_ddd_overrides = data_generator_kwargs or {}
min_n = _ddd_panel_viable_min_n(
_ddd_overrides.get("group_frac", 0.5),
_ddd_overrides.get("partition_frac", 0.5),
floor=_ddd_panel_profile().min_n,
)
else:
min_n = profile.min_n if profile is not None else 20

# DDD grid snapping: bisection candidates must be multiples of 8
is_ddd_grid = estimator_name == "TripleDifference" and data_generator is None
# Grid snapping: the cross-sectional 2x2x2 DDD DGP rounds n_units to a
# multiple of 8, so bisection candidates must snap to that grid. The panel
# DGP maps n_units directly → continuous (step-1) search like every other
# estimator.
is_ddd_grid = is_ddd and not is_ddd_panel
grid_step = 8 if is_ddd_grid else 1
convergence_threshold = grid_step + 1 # 9 for DDD, 2 for others
convergence_threshold = grid_step + 1 # 9 for cross-sectional DDD, 2 otherwise

if is_ddd_grid and data_generator_kwargs and "n_per_cell" in data_generator_kwargs:
raise ValueError(
Expand Down Expand Up @@ -2991,9 +3180,25 @@ def _power_at_n(n: int) -> float:
)

# --- Bracket ---
abs_min = 16 if is_ddd_grid else 4
# Cross-sectional DDD wants a >=16 floor so the 8 G×P×T cells are populated;
# the panel path uses its split-aware viable floor (min_n above, which is
# >=16 and higher for skewed group_frac/partition_frac); everything else
# floors at 4.
if is_ddd_panel:
abs_min = min_n
elif is_ddd:
abs_min = 16
else:
abs_min = 4
if survey_config is not None:
abs_min = max(abs_min, survey_config.min_viable_n)
if is_ddd_panel and n_range is not None and n_range[1] < abs_min:
raise ValueError(
f"n_range upper bound ({n_range[1]}) is below the minimum panel-DDD "
f"sample size ({abs_min}) needed to populate all (group, partition) "
f"cells for group_frac/partition_frac. Raise the upper bound or move "
f"the split closer to 0.5."
)
if n_range is not None:
lo, hi = _snap_n(n_range[0], "up", floor=abs_min), _snap_n(
n_range[1], "down", floor=abs_min
Expand Down
Loading
Loading