Uses a natural experiment to separate data into treatment and control groups
A policy or naturally occurring event that creates exogenous variation in a variable
Ex: Increasing the minimum wage in one state but not another
Ex: Mandated fitness policies in some schools but not others
Usually data in the form of pooled cross-sections
Individuals in 2 (or more) time periods
Could be same or different people in each time period
Can also have data in 2 groups in same time period
Individuals in 2 different states, countries
People in 2 different schools
DiD is a specific way to compare treatment to control group
Imagine the following scenario
We have data on 2 groups of people (A, and B)
We have data during 2 time periods (1 and 2)
Group A was exposed to some treatment in period 2
Ex: Minimum Wages (Card and Krueger (1994))
Group A (New Jersey) increases their minimum wage in period 2
Group B (Pennsylvania) does not change its minimum wage
We observe both groups before and after the wage change
We want to measure the causal effect of the policy
We will couch this estimator in the potential outcomes framework
Recall that observed outcomes are \[y = (y_{1} - y_{0} )w + y_{0}\]
\(y_{0}\) is outcome without treatment
\(y_{1}\) is outcome with treatment
Define the following dummy variables
\(dA = 1\) if in group A, 0 if in group B
\(d2 = 1\) of in time period 2, 0 if in period 1
\(w\) is the binary treatment indicator
Equals 1 for group A in period 2, and 0 otherwise
It is equivalent to \(dA \times d2\)
\[E[y|dA = 1, d2 = 1] - E[y|dA = 0, d2 = 1]\] \[ = E[y_1|dA = 1, d2 = 1] - E[y_0|dA = 0, d2 = 1]\]
\[ = E[y_1|dA = 1, d2 = 1] - E[y_0|dA = 1, d2 = 1] \] \[+ E[y_0|dA = 1, d2 = 1]- E[y_0|dA = 0, d2 = 1]\]
\[E[y|dA = 1, d2 = 1] - E[y|dA = 0, d2 = 1]\] \[ -[E[y|dA = 1, d2 = 0] - E[y|dA = 0, d2 = 0]]\] \[ = E[y_1|dA = 1, d2 = 1] - E[y_0|dA = 1, d2 = 1] \] \[+ E[y_0|dA = 1, d2 = 1]- E[y_0|dA = 0, d2 = 1]\] \[ -[E[y_0|dA = 1, d2 = 0] - E[y_0|dA = 0, d2 = 0]]\]
Selection bias is zero if we make the constant bias assumption
From above
\[+ E[y_0|dA = 1, d2 = 1]- E[y_0|dA = 0, d2 = 1]\] \[ -[E[y_0|dA = 1, d2 = 0] - E[y_0|dA = 0, d2 = 0]]\]
Constant bias means group difference in \(y_0\) is the same in both time periods
An equivalent assumption is common trends assumption
Rearrange the selection bias term to be
\[+ E[y_0|dA = 1, d2 = 1]- E[y_0|dA = 1, d2 = 0]\] \[ -[E[y_0|dA = 0, d2 = 1] - E[y_0|dA = 0, d2 = 0]]\]
Says trend in \(y_0\) for group A is the same as group B
Both assumptions are equivalent and assume no selection bias
\[y = \beta_0 + \beta_1 dA + \beta_2 d2 + \beta_3 (dA \times d2) + u\]
\[\beta_3 = E[y|dA = 1, d2 = 1] - E[y|dA = 0, d2 = 1] \] \[- E[y|dA = 1, d2 = 0]- E[y|dA = 0, d2 = 0 ]\]
Recall that \(\beta_3\) is the ATT if we assume common trends or constant bias
Under the assumption of constant treatment effects, \(\beta_{3} = ATE = ATT\)
If treatment effect can vary across groups, \(\beta_{3} = ATT\)
Estimation typically done by OLS \[y = \hat{\beta}_{0} + \hat{\beta}_{1}dA +\hat{\beta}_{2}d2 + \hat{\beta}_{3}(dA\times d2)+ u\]
The coefficient on \((dA\times d2)\) is the DiD estimator
It is given simply by \[\hat{\beta}_{3} = (\bar{y}_{A2} - \bar{y}_{B2}) - (\bar{y}_{A1} -\bar{y}_{B1})\]
You can also add control variables \[y= \beta_{0} + \beta_{1}dA +\beta_{2}d2 + \beta_{3}(dA\times d2)+\mathbf{x}\boldsymbol{\alpha}+ u\]
The interpretation remains similar, but estimator no longer collapses to the easy formula
Imagine the Card study where NJ raises the minimum wage
Suppose that NY also raises it \[y = \beta_{0} + \beta_{1}d(NJ,NY) +\beta_{2}d2 +\beta_{3}(d(NJ,NY)\times d2)+ u\]
\(d(NJ,NY)\) equals 1 if in NJ or NY
This restricts the effect to be the same in NJ and NY
Allowing for different impacts in NY and NJ
\[y = \beta_{0} + \beta_{1}dNJ+\beta_{2}dNY +\beta_{3}d2 + \beta_{4}(dNJ\times d2)+ \beta_{5}(dNY\times d2)+ u\]
Here, PA is used as the control group for both treatments
Imagine again the Card study with only NJ and PA
Suppose we observe 4 time periods: 2 before, and 2 after policy change
The specification could change to \[y = \beta_{0} + \beta_{1}dNJ +\beta_{2}d2 +\beta_{3}d3 +\beta_{4}d4 + \beta_{5}(dNJ\times d(3,4))+u\]
where \(d(3,4)\) is a dummy equal to 1 if time period is 3 or 4
This would impose that the treatment effect is the same in all years
A more general way to model this would be \[y = \beta_{0} + \beta_{1}dNJ +\beta_{2}d2 +\beta_{3}d3 +\beta_{4}d4 + \beta_{5}(dNJ\times d3)+ \beta_{6}(dNJ\times d4)+ u\]
\(\beta_{5}\) is the treatment effect in time period 3
\(\beta_{6}\) is the treatment effect in time period 4
In this most general case, the specification is \[y = \beta_{0} + \mathbf{dG}\boldsymbol{\beta_{1}} +\mathbf{dT}\boldsymbol{\beta_{2}}+\mathbf{z}\boldsymbol{\beta_{3}}+ u\]
Where
\(\mathbf{dG}\) is a set of group dummies
\(\mathbf{dT}\) is a set of time dummies
\(\mathbf{z}\) are interaction between the time and group dummies post treatment
\(\mathbf{z}\) could also be a scalar equal to 1 for all groups and time periods post treatment
This is sometimes called the Two-Way Fixed Effects (TWFE) model
Recall that a required assumption is common trends
You can check that empirically by looking at trends for treatment and control groups prior to treatment
To do this, you can
Estimate regression with sample before treatment
Include time dummy variables, group variables, interactions
Do F-test on interactions
You must use “cluster-robust” standard errors
Error is not iid across all individuals
Common group component
Must correct for this or SEs will be biased downward severely
Donald and Lang (2001)
Clustered standard errors may not be enough
Asymptotic results bad when number of groups is small
Use bootstrap to solve this problem
Bertrand, Duflo, Mullainathan (2004)
Errors may be serially correlated
Only a problem when there are many time periods
To fix, you can aggregate to 2 time periods
Use a “block” bootstrap
You can have multiple groups who are treated at different times
DiD estimator is complicated weighted average of treatment and control pairs
Can be even more complicated if treatment effect varies over time
See Roth et. al. (2023) for an overview
Sometimes even the DiD assumptions are not valid
Example: Health care policy aimed at elderly (Wooldridge (2007))
Suppose we have two groups
Elderly in state with policy
Elderly in state without policy
Both are observed in 2 time periods
Could do DiD using difference in non-policy state as control
Now, imagine in each state we have
Elderly and non-elderly
Both are observed in 2 time periods
Could do DiD in treatment state using difference in non-elderly as control group
With 2 potential DiDs, which one do we choose?
The DDD estimator is an extension of the DiD estimator
Uses both DiDs in the same specification
Uses controls from within the treatment state (non-elderly) and from the other state (non-policy state) simultaneously
Suppose we separate the sample into these groups
Group A: People in treatment state
Group B: People in control state
Group E: Elderly
Group N: Non Elderly
The regression would be \[y = \beta_{0} + \beta_{1}dA + \beta_{2}dE +\beta_{3}d2 + \beta_{3}(dA\times d2)\] \[+ \beta_{4}(dE\times d2) + \beta_{5}(dE\times dA) + \beta_{6}(dE\times dA \times d2) + u\]
The DDD estimator is \[\hat{\beta}_{6} = [(\bar{y}_{AE2} - \bar{y}_{AN2}) - (\bar{y}_{AE1} - \bar{y}_{AN1})]\] \[- [(\bar{y}_{BE2} - \bar{y}_{BN2}) - (\bar{y}_{BE1} - \bar{y}_{BN1})]\]
Intuition
Do a separate DiD in treatment and control state
Subtract the 2 DiD estimates
There are actually several ways to interpret this estimator
The same problems with DiD apply to DDD estimators
You must be careful with the standard errors
You must be careful with the interpretation