generate_event_study_data#
- causalpy.data.simulate_data.generate_event_study_data(n_units=20, n_time=20, treatment_time=10, treated_fraction=0.5, event_window=(-5, 5), treatment_effects=None, unit_fe_sigma=1.0, time_fe_sigma=0.5, noise_sigma=0.2, predictor_effects=None, ar_phi=0.9, ar_scale=1.0, seed=None)[source]#
Generate synthetic panel data for event study / dynamic DiD analysis.
Creates panel data with unit and time fixed effects, where a fraction of units receive treatment at a common treatment time. Treatment effects can vary by event time (time relative to treatment). Optionally includes time-varying predictor variables generated via AR(1) processes.
- Parameters:
n_units (
int) – Total number of units (treated + control). Default 20.n_time (
int) – Number of time periods. Default 20.treatment_time (
int) – Time period when treatment occurs (0-indexed). Default 10.treated_fraction (
float) – Fraction of units that are treated. Default 0.5.event_window (
tuple[int,int]) – Range of event times (K_min, K_max) for which treatment effects are defined. Default (-5, 5).treatment_effects (
dict[int,float] |None) – Dictionary mapping event time k to treatment effect beta_k. Default creates effects that are 0 for k < 0 (pre-treatment) and gradually increase post-treatment.unit_fe_sigma (
float) – Standard deviation for unit fixed effects. Default 1.0.time_fe_sigma (
float) – Standard deviation for time fixed effects. Default 0.5.noise_sigma (
float) – Standard deviation for observation noise. Default 0.2.predictor_effects (
dict[str,float] |None) – Dictionary mapping predictor names to their true coefficients. Each predictor is generated as an AR(1) time series that varies over time but is the same for all units at a given time. For example,{'temperature': 0.3, 'humidity': -0.1}creates two predictors. Default None (no predictors).ar_phi (
float) – AR(1) autoregressive coefficient controlling persistence of predictors. Values closer to 1 produce smoother, more persistent series. Default 0.9.ar_scale (
float) – Standard deviation of the AR(1) innovation noise for predictors. Default 1.0.
- Returns:
Panel data with columns: - unit: Unit identifier - time: Time period - y: Outcome variable - treat_time: Treatment time for unit (NaN if never treated) - treated: Whether unit is in treated group (0 or 1) - <predictor_name>: One column per predictor (if predictor_effects provided)
- Return type:
pd.DataFrame
Example
>>> from causalpy.data.simulate_data import generate_event_study_data >>> df = generate_event_study_data( ... n_units=20, n_time=20, treatment_time=10, seed=42 ... ) >>> df.shape (400, 5) >>> df.columns.tolist() ['unit', 'time', 'y', 'treat_time', 'treated']
With predictors:
>>> df = generate_event_study_data( ... n_units=10, ... n_time=10, ... treatment_time=5, ... seed=42, ... predictor_effects={"temperature": 0.3, "humidity": -0.1}, ... ) >>> df.shape (100, 7) >>> "temperature" in df.columns and "humidity" in df.columns True