generate_event_study_data#

causalpy.data.simulate_data.generate_event_study_data(n_units=20, n_time=20, treatment_time=10, treated_fraction=0.5, event_window=(-5, 5), treatment_effects=None, unit_fe_sigma=1.0, time_fe_sigma=0.5, noise_sigma=0.2, predictor_effects=None, ar_phi=0.9, ar_scale=1.0, seed=None)[source]#

Generate synthetic panel data for event study / dynamic DiD analysis.

Creates panel data with unit and time fixed effects, where a fraction of units receive treatment at a common treatment time. Treatment effects can vary by event time (time relative to treatment). Optionally includes time-varying predictor variables generated via AR(1) processes.

Parameters:

n_units (int) – Total number of units (treated + control). Default 20.
n_time (int) – Number of time periods. Default 20.
treatment_time (int) – Time period when treatment occurs (0-indexed). Default 10.
treated_fraction (float) – Fraction of units that are treated. Default 0.5.
event_window (tuple[int, int]) – Range of event times (K_min, K_max) for which treatment effects are defined. Default (-5, 5).
treatment_effects (dict[int, float], optional) – Dictionary mapping event time k to treatment effect beta_k. Default creates effects that are 0 for k < 0 (pre-treatment) and gradually increase post-treatment.
unit_fe_sigma (float) – Standard deviation for unit fixed effects. Default 1.0.
time_fe_sigma (float) – Standard deviation for time fixed effects. Default 0.5.
noise_sigma (float) – Standard deviation for observation noise. Default 0.2.
predictor_effects (dict[str, float], optional) – Dictionary mapping predictor names to their true coefficients. Each predictor is generated as an AR(1) time series that varies over time but is the same for all units at a given time. For example, {'temperature': 0.3, 'humidity': -0.1} creates two predictors. Default None (no predictors).
ar_phi (float) – AR(1) autoregressive coefficient controlling persistence of predictors. Values closer to 1 produce smoother, more persistent series. Default 0.9.
ar_scale (float) – Standard deviation of the AR(1) innovation noise for predictors. Default 1.0.
seed (int, optional) – Random seed for reproducibility.

Returns:

Panel data with columns: - unit: Unit identifier - time: Time period - y: Outcome variable - treat_time: Treatment time for unit (NaN if never treated) - treated: Whether unit is in treated group (0 or 1) - <predictor_name>: One column per predictor (if predictor_effects provided)

Return type:

pd.DataFrame

Example

>>> from causalpy.data.simulate_data import generate_event_study_data
>>> df = generate_event_study_data(
...     n_units=20, n_time=20, treatment_time=10, seed=42
... )
>>> df.shape
(400, 5)
>>> df.columns.tolist()
['unit', 'time', 'y', 'treat_time', 'treated']

With predictors:

>>> df = generate_event_study_data(
...     n_units=10,
...     n_time=10,
...     treatment_time=5,
...     seed=42,
...     predictor_effects={"temperature": 0.3, "humidity": -0.1},
... )
>>> df.shape
(100, 7)
>>> "temperature" in df.columns and "humidity" in df.columns
True