Causality and Regression

EC655 - Econometrics

Justin Smith

Wilfrid Laurier University

Fall 2023

Introduction

Empirical economists are often interested in a causal effect
- The independent effect of a particular variable on the outcome
It is important for policy
- E.g. a school district looking to implement pre-kindergarten program
- Need to know if pre-k has independent effects on current and future outcomes
  - Do not want estimate confounded with parent background
Can we interpret regression slopes as causal?
First, we attempt to understand the underlying concept of causality

Rubin Causal Model

Model Setup

Economists often study causality within the Rubin Causal Model
Imagine an individual either gets a “treatment” or “no treatment”
- Getting a drug, or placebo
- Going to university, or stopping at high school
- Being in a small class, or large class
Define the following potential outcomes
- \(y_{1}\) is the outcome with treatment
- \(y_{0}\) is the outcome without treatment
- \(w\) is a binary variable with 1 denoting treatment, and 0 no treatment

Important

We only observe one potential outcome, and the other is hypothetical. For a treated person we see \(y_{1}\), and for an untreated person we see \(y_{0}\)

Treatment Effects

We would like to know the treatment effect \(y_{1} - y_{0}\) for an individual
- This is the causal effect of the treatment
- Effect differs from person to person in the population
Fundamental problem of causal inference: we never observe both \(y_{1}\) and \(y_{0}\)
We only observe \((y, w)\), where

\[y = y_{0} + (y_{1} -y_{0})w\]
- We observe treatment status, potential outcome given that treatment status
The counterfactual outcome with opposite treatment is never observed

Regression Slopes and Potential Outcomes

Suppose we regress \(y\) on \(w\)
The slope in this regression is \(\beta = E(y|w=1) - E(y|w=0)\)
Express in terms of potential outcomes

\[E(y|w=1) - E(y|w=0)\] \[= E(y_{1}|w=1) - E(y_{0}|w=0)\] \[= \left [ E(y_{1}|w=1) - E(y_{0}|w=1) \right ] + E(y_{0}|w=1) - E(y_{0}|w=0)\]
The first term is called the Average Treatment Effect on the Treated (ATT)
- Average effect of the treatment for those in the treatment group

Note

Groups in the population can have different treatment effects, so there isn’t necessarily a single treatment or causal effect.

Selection bias

The second term is Selection Bias
- Preexisting difference between treatment and control groups
Example: \(y\) is income, and \(w\) is going to university
- Selection bias is potential income difference in absence of university
- Would happen if people who end up in university are already smarter
If selection bias exists, regression slope is a combination of treatment effect and bias
There are some cases where there is no selection bias, and regression slope is only causal effect
We outline those below

Randomizing Treatment Status

Randomization and Independence of Treatment

A common way to isolate treatment effects is to randomize \(w\)
- Blindly put people into treatment or control group
- Ensures that on average the two groups are similar at baseline
When treatment is randomized, potential outcomes are independent from treatment \[(y_{0}, y_{1}) \perp w\]
Independence implies conditioning on \(w\) has no effect on expectation

\[E(y_{0}|w=1) =E(y_{0}|w=0)\] \[E(y_{0}|w) = E(y_{0})\] \[E(y_{1}|w) = E(y_{1})\]

Randomization and Treatment Effects

With randomization, selection bias is zero

\[E(y_{0}|w=1) - E(y_{0}|w=0) = E(y_{0}|w=1) - E(y_{0}|w=1) = 0\]
As a result the population regression slope is \[\beta = E(y|w=1) - E(y|w=0)\] \[= E(y_{1}|w=1) - E(y_{0}|w=0)\] \[= E(y_{1}) - E(y_{0})\]
This is the Average Treatment Effect (ATE)

Note

The ATE is the treatment effect averaged across everyone in the population, whereas the ATT is the treatment effect among only people in the treatment group (i.e. excludes people in the control group).

Recent Example in Economics Literature

Randomization is the standard way to measure the effects of medical treatments
It is becoming more popular in economics
Ex: Bangladesh mask study (Abaluck et. al., 2021)
- Randomized promoting mask use in rural Bangladesh
- Compare COVID rates between treatment and control
- Find some positive effect of masks, especially for age 50+

Causal Effects without Randomization

Mean Independence of Treatment

Most economic data do not come from randomized experiments
We can still uncover causal effects without experiments
One way is through Mean Independence

\[E(y_{0}|w) = E(y_{0})\] \[E(y_{1}|w) = E(y_{1})\]
Says average potential outcomes do not depend on treatment status
- Weaker assumption than full statistical independence
- Full independence means one event has no effect on probability of another

Mean Independence of Treatment

With mean independence, we get

\[\beta = E(y|w=1) - E(y|w=0)\] \[= \left [ E(y_{1}|w=1) - E(y_{0}|w=1) \right ] + E(y_{0}|w=1) - E(y_{0}|w=0)\] \[= \left [ E(y_{1}|w=1) - E(y_{0}|w=1) \right ]\] \[= E(y_{1}) - E(y_{0})\]
Regression slope equals ATE (and ATT in this case)
Is this assumption realistic?
- Means both potential outcomes unrelated to treatment
- Whether this is realistic depends on context

Mean Independence of \(y_{0}\)

A variation if this assumption is mean independence of \(\mathbf{y_{0}}\)

\[E(y_{0}|w) = E(y_{0})\]
The regression slope in this case is
\[\beta = E(y|w=1) - E(y|w=0)\] \[= \left [ E(y_{1}|w=1) - E(y_{0}|w=1) \right ] + E(y_{0}|w=1) - E(y_{0}|w=0)\] \[= \left [ E(y_{1}|w=1) - E(y_{0}|w=1) \right ]\]
With this assumption, we only measure the ATT (Not ATE)
Is this realistic?
- Means there are no baseline differences between groups on average
- Puts no restriction on differences in treated outcome

Conditional Mean Independence

More commonly, we can use other variables to control for selection bias
Suppose we observe a set of pre-treatment characteristics \(\mathbf{x}\)
- Ex: gender, parental education, school test scores, etc.
- Key is they are determined before treatment
Conditional Mean Independence is when the mean of the potential outcomes is independent of treatment conditional on \(\mathbf{x}\)
\[E(y_{0}|w=1, \mathbf{x}) =E(y_{0}|w=0, \mathbf{x})\] \[E(y_{0}|w, \mathbf{x}) = E(y_{0}| \mathbf{x})\] \[E(y_{1}|w, \mathbf{x}) = E(y_{1}|\mathbf{x})\]

Note

A stronger assumption is conditional independence \((y_{0}, y_{1}) \perp w |\mathbf{x}\), where each potential outcome is fully independent of treatment conditional on \(\mathbf{x}\)

Conditional Mean Independence

Imagine running a regression of \(y\) on \(w\) and \(\mathbf{x}\)
The population regression slope is

\[\beta = E(y|w=1, \mathbf{x}) - E(y|w=0, \mathbf{x})\]
When this assumption is true we can get treatment effects at each \(\mathbf{x}\) \[E(y|w=1, \mathbf{x}) - E(y|w=0, \mathbf{x})\] \[= E(y_{1}|w=1, \mathbf{x}) - E(y_{0}|w=1, \mathbf{x})= E(y_{1} | \mathbf{x}) - E(y_{0}| \mathbf{x})\] \[= ATT( \mathbf{x}) =ATE( \mathbf{x})\]
These treatment effects hold \(\mathbf{x}\) constant
- They are an average of the treatment effects across different values of \(\mathbf{x}\)

Continuous Treatment

We can also apply this model to a continuous treatment variable
All of the intuition we have discussed at the same
- Just slightly more complex because treatment is continuous

Summary of Rubin Model

Regression will identify a causal effect if
- Treatment comes from a randomized experiment
- Potential outcomes are mean independent of treatment
- Potential outcomes are mean independent of treatment conditional on \(\mathbf{x}\)
Without one of these, we could have selection bias
- Regression does not provide a causal effect
Unfortunately, these assumptions are generally not met
- We will cover other methods to uncover the causal effect in this case

Simulation Examples for Rubin Model

Introduction

This section will demonstrate the Rubin model with simulated data
We will show what happens in regression under different assumptions
Use large samples to simulate the population
- Replace population values with consistent estimators
We will simulate both potential outcomes
- Even though they would not both be observed in practice

Data Setup

Code to the right creates potential outcomes
For simplicity the treatment effect is set to 5 for everyone
Outcomes \(y_{0}\) and \(y_{1}\) created to have a Normal distribution
- Normality is not important for the model

data <- data.frame(eta=rnorm(100000,0,1)) %>%
  mutate(y0 = 2 + eta, y1 = y0 + 5, 
         treat_eff = y1 - y0)

sumtable(data, summ=c('notNA(x)','mean(x)','sd(x)'), 
         summ.names = c('N', 'Mean', 'SD'))

Summary Statistics
Variable	N	Mean	SD
eta	100000	-0.0012	1
y0	100000	2	1
y1	100000	7	1
treat_eff	100000	5	0.00000000000000021

Random Assignment to Treatment

Next assign treatment \(w\) using randomization
In the code, \(w=1\) randomly with probability 0.5
Compute observed \(y\) based on treatment status

data %<>% mutate(w = if_else(runif(100000) > .5,1,0), 
                 y = y0 + (y1-y0)*w) %>% 
  group_by(w)

head(data)
## # A tibble: 6 × 6
## # Groups:   w [2]
##      eta    y0    y1 treat_eff     w     y
##    <dbl> <dbl> <dbl>     <dbl> <dbl> <dbl>
## 1  2.16  4.16   9.16         5     1 9.16 
## 2 -1.63  0.365  5.37         5     0 0.365
## 3 -0.794 1.21   6.21         5     1 6.21 
## 4  1.07  3.07   8.07         5     0 3.07 
## 5  0.133 2.13   7.13         5     1 7.13 
## 6  1.90  3.90   8.90         5     0 3.90

Random Assignment to Treatment

With random assignment we know
- \(y_{0}\) is independent of \(w\)
- \(y_{1}\) is independent of \(w\)
So the distributions of \(y_{0}\) and \(y_{1}\) are the same when \(w=0\) and when \(w=1\)
To the right we show the distribution of \(y_{0}\)

ggplot(data, aes(x=y0, color=as.factor(w))) +
  geom_density(alpha = .4, size=2) +
  theme_pander(nomargin=FALSE, boxes=TRUE) +
  labs(title = "Distribution of Y0", color = "w")

Random Assignment to Treatment

Randomization ensures difference in average \(y\) between groups equals the ATE and ATT
On the right we show the difference in mean of \(y\) equals 5

summarize(data, my = mean(y)) %>%
  mutate(diff_y = my - lag(my))
## # A tibble: 2 × 3
##       w    my diff_y
##   <dbl> <dbl>  <dbl>
## 1     0  2.00  NA   
## 2     1  7.00   5.00

Random Assignment to Treatment

Can implement difference in means as a regression
Recall slope in OLS regression of \(y\) on dummy variable is difference in means of \(y\)

lm(y ~ w, data)
## 
## Call:
## lm(formula = y ~ w, data = data)
## 
## Coefficients:
## (Intercept)            w  
##       1.997        5.005

Selection into Treatment

Now simulate selection into treatment based on \(y_{0}\)
- Treatment now related to value of \(y_{0}\)
We know \(\eta\) determines value of \(y_{0}\)
- If we make \(w=1\) with higher values of \(\eta\) then \(w\) is related to \(y_{0}\)

data2 <- data %>% 
  ungroup() %>% 
  select(eta, y0,y1) %>%
  mutate(w = if_else(eta + runif(100000,-1,1) > 0,1,0), 
         y = y0 + (y1-y0)*w) %>%
  group_by(w)

sumtable(data2, 
         summ=c('notNA(x)','mean(x)','sd(x)'),
         summ.names = c('N', 'Mean', 'SD' ),
         group="w",
         group.long = TRUE)

Selection into Treatment

The means of \(y_{0}\) and \(y_{1}\) are now different by group
Because of selection bias
- Treated group has better non-treated outcomes

Summary Statistics
Variable	N	Mean	SD
w: 0
eta	49920	-0.68	0.73
y0	49920	1.3	0.73
y1	49920	6.3	0.73
y	49920	1.3	0.73
w: 1
eta	50080	0.68	0.73
y0	50080	2.7	0.73
y1	50080	7.7	0.73
y	50080	7.7	0.73

Selection into Treatment

The distribution of \(y_{0}\) differs by \(w\)
- Treated group has better baseline outcomes
This creates selection bias

ggplot(data2, aes(x=y0, color=as.factor(w))) +
  geom_density(alpha = .4, size=2) +
  theme_pander(nomargin=FALSE, boxes=TRUE) +
  labs(title = "Distribution of Y0", color = "w")

Selection into Treatment

Selction bias shows up when you take difference in mean \(y\)
- We know the true treatment effect is 5
- But difference in \(y\) is larger
- There is positive selection bias
- Bias is about 1.4

summarize(data2, my = mean(y)) %>%
  mutate(diff_y = my - lag(my))
## # A tibble: 2 × 3
##       w    my diff_y
##   <dbl> <dbl>  <dbl>
## 1     0  1.32  NA   
## 2     1  7.68   6.36

Selection into Treatment

You can implement this as a regression
OLS estimates biased treatment effect
Remember the intercept is mean of \(y\) when \(w=0\)

lm(y ~ w, data2)
## 
## Call:
## lm(formula = y ~ w, data = data2)
## 
## Coefficients:
## (Intercept)            w  
##       1.319        6.358

Conditional Mean Independence

Finally consider conditional mean independence
Treatment is related to \(y_{0}\), but only through \(x\)
- For people with the same \(x\), \(y_{0}\) is unrelated to \(w\)
Ex: Education and wages
- Smart people \((x = 1)\) earn higher wages regardless of schooling \((y_0)\)
- Smart people are more likely to go to university \((w = 1)\)
- People at university will have higher \(y_0\)

data3 <- data %>% 
  ungroup() %>% 
  select(eta) %>%
  mutate(x = if_else(runif(100000) > .5,1,0),
         w = if_else(x + runif(100000, -1,1) > .5,1,0),
         y0 = 2 + 3*x + eta,
         y1 = y0 + 5,
         y = y0 + (y1-y0)*w) %>%
  group_by(w)

sumtable(data3, 
         summ=c('notNA(x)','mean(x)','sd(x)'),
         summ.names = c('N', 'Mean', 'SD' ),
         group="w",
         group.long = TRUE)

Conditional Mean Independence

Comparing treatment and control, \(y_{0}\) is bigger when \(w=1\)
This is because
- \(y_{0}\) is bigger when \(x=1\)
- \(w\) more likely to be \(1\) when \(x=1\)

Summary Statistics
Variable	N	Mean	SD
w: 0
eta	49913	-0.0015	1
x	49913	0.25	0.43
y0	49913	2.8	1.6
y1	49913	7.8	1.6
y	49913	2.8	1.6
w: 1
eta	50087	-0.00096	1
x	50087	0.75	0.43
y0	50087	4.2	1.6
y1	50087	9.2	1.6
y	50087	9.2	1.6

Conditional Mean Independence

What if we focus only on people with \(x=1\)?
No difference in \(y_{0}\) between treated and control
- Because \(x\) is only reason why they differed
- This is holding \(x\) fixed

sumtable(filter(data3, x==1), 
         summ=c('notNA(x)','mean(x)','sd(x)'),
         summ.names = c('N', 'Mean', 'SD' ),
         group="w")

Summary Statistics
w	0			1
Variable	N	Mean	SD	N	Mean	SD
eta	12528	-0.0032	1	37517	0.0032	1
x	12528	1	0	37517	1	0
y0	12528	5	1	37517	5	1
y1	12528	10	1	37517	10	1
y	12528	5	1	37517	10	1

Conditional Mean Independence

Same result if we hold \(x=0\)?
- Again because \(x\) is only reason why they differed

sumtable(filter(data3, x==0), 
         summ=c('notNA(x)','mean(x)','sd(x)'),
         summ.names = c('N', 'Mean', 'SD' ),
         group="w")

Summary Statistics
w	0			1
Variable	N	Mean	SD	N	Mean	SD
eta	37385	-0.0009	0.99	12570	-0.013	1
x	37385	0	0	12570	0	0
y0	37385	2	0.99	12570	2	1
y1	37385	7	0.99	12570	7	1
y	37385	2	0.99	12570	7	1

Conditional Mean Independence

Regression of \(y\) on \(w\) is biased
- Because \(w\) is correlated with error
But regression of \(y\) on \(w\) and \(x\) generates actual treatment effect
This is conditional mean independence
- Holding \(x\) fixed, potential outcomes no longer related to treatment

lm(y ~ w, data3)
## 
## Call:
## lm(formula = y ~ w, data = data3)
## 
## Coefficients:
## (Intercept)            w  
##       2.752        6.495
lm(y ~ w + x, data3)
## 
## Call:
## lm(formula = y ~ w + x, data = data3)
## 
## Coefficients:
## (Intercept)            w            x  
##       1.997        4.997        3.007

Directed Acyclic Graphs (DAG)

Introduction

The Rubin framework is one way to understand causality
Another popular method is a Directed Acyclic Graph (DAG)
A DAG is a graphical tool to show relationships between variables
It can make a complex model easier to understand
For econometrics, it can highlight bias in our estimates
- And how to overcome the bias
In this section we will introduce DAGs, and expand on it in further sections

Basic DAG

Suppose we are interested in the direct effect of \(w\) on \(y\)
- \(w\) is the treatment
- \(y\) is the outcome
The DAG to the right shows how \(w\) and \(y\) could be related
\(x\) is another factor that is related to both \(w\) and \(y\)
The dots (with variable names) are nodes
The lines connecting nodes are edges
Direction of the arrows is the direction of the relationship

Basic DAG

We want to find all the paths that can lead from \(w\) to \(y\)
In this DAG there are two
- A direct path \(w \rightarrow y\)
- An indirect path \(w \leftarrow x \rightarrow y\)
A correlation between \(y\) on \(w\) would reflect both paths
- \(w\) can directly cause \(y\)
- \(x\) could change \(w\) and \(y\) simultaneously creating a spurious correlation between them

Types of Paths

In a DAG you may find two types of paths
1. Front Door: arrows point away from treatment and toward to outcome
2. Back Door: arrows point toward treatment
We are usually interested in front door paths
- Show how \(w\) causes changes in \(y\)
Back door paths are usually bad
- Show ways that a correlation between treatment and outcome are biased

Caution

There can be multiple front door paths in a DAG, and we might not be interested in all of them.

Open and Closed Paths

Paths between treatment and outcome can be open or closed
1. Open: all variables along the path are allowed to vary
2. Closed: at least one variable along the path cannot vary, or there is a collider on the path
If you regress \(y\) on \(w\) both paths are open
- \(w\) and \(y\) can vary, so direct path is open
- We are not holding \(x\) fixed, so indirect path is also open
You can close the backdoor path by controlling for \(x\)
- Regress \(y\) on \(w\) and \(x\)

Open and Closed Paths

Suppose we are interested in the causal effect of \(w\) on \(y\)
In a DAG, this means we only want the path \(w \rightarrow y\)
We need to leave this path open, but close all others
The regression of \(y\) on \(w\) and \(x\) does this
It works in this case because \(x\) is a confounder
Controlling for other variables does not always identify a causal effect

Different Associations in DAG

Confounder

Mediator

Collider

Confounder

The DAG we have been working with has \(x\) as a confounder
Confounder: The treatment and outcome have a common cause
In this example
- \(x\) changes \(w\) and \(y\) simultaneously
- Makes it look like \(w\) and \(y\) are related, but it is spurious
Example: Returns to Schooling
- \(y\) is wages, \(w\) is years of schooling
- A confounder \(x\) would be intelligence
Can close the backdoor path by controlling for \(x\)

Mediator

In this DAG we have two paths

\[w \rightarrow y\] \[w \rightarrow x \rightarrow y\]

Both are front door paths
In this example \(x\) acts as a mediator
Mediator: A variable within a causal pathway between treatment and outcome
In this example
- \(w\) causes \(y\) directly
- But also indirectly through \(x\)

Mediator

This is not the same as omitted variables bias
A regression of \(y\) on \(w\) measures the causal effect
- Two causal paths between \(w\) and \(y\)
We can still close the indirect path by controlling for \(x\)
This would leave only the direct path
Whether we want to do this depends on the research question
- If you want only direct effect, control for \(x\)
- For total effect, do not control

Mediator

Example: Effect of drinking on lifespan
\(w\) is drinking, \(y\) is lifespan, \(x\) is drug use
If drug use is a mediator then

\[drinking \rightarrow lifespan\] \[drinking \rightarrow drugs \rightarrow lifespan\]

Drinking has direct effect on lifespan, and also through drug use
Regression of \(lifespan\) on \(drinking\) measures total effect
Regression of \(lifespan\) on \(drinking\) and \(drug\) use measures only direct effect

Collider

Final association is trickier
Collider: a variable that is influenced by two or more variables
In this DAG, \(x\) is a collider
- Both \(y\) and \(w\) point to it
Paths that contain a collider are closed
In this case, you could regress \(y\) on \(w\) and get causal effect

Collider

Controlling for a collider opens the path
Regression of \(y\) on \(w\) and \(x\) does not measure causal effect
Collider Bias: bias induced by controlling for collider
Collider bias can
- Create an association between \(w\) and \(y\) when there is no direct one
- Hide a direct association
Easiest way to understand is through example

Collider

Example: talent and effort
\(w\) is talent, \(y\) is effort, \(x\) is being in elite school
Assume no direct link between talent and effort
Both talented and hard working students make it to elite schools
Controlling for school, talent and effort look negatively related
- You need to be smart or work hard to get into Harvard
- Low-effort students must be smart, and un-smart students must be hard working

Collider

Example: gender pay gap
\(w\) is gender, \(y\) is ability, \(x\) is occupation
Assume men and women have equal ability
High ability people make it into certain occupations
If discrimination against women, they must be high ability to get in
Controlling for occupation, women appear to have higher ability
This could lead to gender pay gap being larger within jobs

A More Complicated DAG

::: columns ::: {.column width=“50%”}

DAGs are often much more complicated
Here is a DAG with 4 variables
- \(y\) is outcome
- \(w\) is treatment
- \(x\) is confounder
- \(u\) is unobserved confounder
There are several paths connecting \(w\) and \(y\)
- \(w \rightarrow y\)
- \(w \leftarrow x \rightarrow y\)
- \(w \leftarrow x \rightarrow u \rightarrow y\)

A More Complicated DAG

To close a backdoor path you need to control for one variable along the path
We cannot control for \(u\) because it is unobserved
Controlling for \(x\) closes both backdoor paths

Simulation Examples for DAGs

Introduction

Like we did with Rubin model, we can simulate DAGs
We will illustrate
- Conditional independence
- Collider bias
Again, this is meant to approximate the population

Conditional Independence

Consider the following scenario
- Higher ability people get more schooling
- Higher ability people earn more
- Higher education increases earnings
- Higher ability is related to unobserved determinants of earnings
Without controlling for ability we will overestimate the effect of education

dagdata <- tibble(
    ability = rnorm(10000),
    educ = ifelse(ability + rnorm(10000, mean = 0, sd = 0.5) > 1,1,0),
    u = ability + rnorm(10000, mean = 0, sd = 2),
    inc = educ + ability + u 
)

lm(inc ~ educ, data = dagdata)
## 
## Call:
## lm(formula = inc ~ educ, data = dagdata)
## 
## Coefficients:
## (Intercept)         educ  
##     -0.6213       4.2127

Conditional Independence

In the DAG, there are two backdoor paths from education to earnings
- \(educ \leftarrow ability \rightarrow inc\)
- \(educ \leftarrow ability \rightarrow u \rightarrow inc\)
Controlling for ability closes both paths
- We do not need to separately control for \(u\)

dagdata <- tibble(
    ability = rnorm(10000),
    educ = ifelse(ability + rnorm(10000, mean = 0, sd = 0.5) > 1,1,0),
    u = ability + rnorm(10000, mean = 0, sd = 2),
    inc = educ + ability + u 
)

lm(inc ~ educ + ability, data = dagdata)
## 
## Call:
## lm(formula = inc ~ educ + ability, data = dagdata)
## 
## Coefficients:
## (Intercept)         educ      ability  
##    0.003091     1.030121     2.006510

Collider Bias

Consider the following scenario
- Talent is unrelated to effort
- Both talent and effort are positively related to being in an elite school
- Elite school admission is based on having high talent or effort
We will see that
Without controlling for elite school, talent and effort are unrelated
Controlling for elite school, talent and effort are negatively related

dagdata2 <- tibble(
    talent = rnorm(10000),
    effort = rnorm(10000),
    elite = ifelse(talent + effort > 1,1,0)
)

ggplot(dagdata2, aes(x = effort, y = talent)) +
  geom_point(alpha = 0.2) +
  geom_smooth(method = "lm", se = FALSE)

Collider Bias

Now color the dots according to elite school status
Then run separate regressions by elite school
Within school, talent and effort are negatively related

ggplot(dagdata2,aes(x = effort, y = talent, color = as.factor(elite)) ) +
  geom_point(alpha = 0.2) +
  geom_smooth(method = "lm", se = FALSE) +
  theme_pander() +
  scale_color_grey()

Collider Bias

You can see this in regression too
First regress talent on effort
- No relationship
Negative relationship when controlling for elite school status

lm(talent ~ effort, data = dagdata2)
## 
## Call:
## lm(formula = talent ~ effort, data = dagdata2)
## 
## Coefficients:
## (Intercept)       effort  
##   -0.003301    -0.018345
lm(talent ~ effort + elite, data = dagdata2) 
## 
## Call:
## lm(formula = talent ~ effort + elite, data = dagdata2)
## 
## Coefficients:
## (Intercept)       effort        elite  
##     -0.3948      -0.3740       1.6407

Omitted Variables Bias in Regression

Partition the \(\mathbf{x}\) vector into two pieces

\[\mathbf{x}= \begin{bmatrix} \mathbf{x_{1}}& \mathbf{x_{2}} \end{bmatrix}\]

Now imagine running a regression of \(\mathbf{y}\) on only \(\mathbf{x_{1}}\)
The slope of that regression is

\[ \boldsymbol{\beta^{*}} = (\textbf{E}[\mathbf{x_{1}'x_{1}}])^{-1} \textbf{E}[\mathbf{x_{1}}'y] \]
Sub in for \(\mathbf{y}\) using the full \(\mathbf{x}\) vector

\[\boldsymbol{\beta^{*}} = (\textbf{E}[\mathbf{x_{1}'x_{1}}])^{-1} \textbf{E}[\mathbf{x_{1}}'(\mathbf{x_{1}}\boldsymbol{\beta_{1}} + \mathbf{x_{2}}\boldsymbol{\beta_{2}} + u)]\]

\[\boldsymbol{\beta^{*}} = \boldsymbol{\beta_{1}} + (\textbf{E}[\mathbf{x_{1}'x_{1}}])^{-1} \textbf{E}[\mathbf{x_{1}}'\mathbf{x_{2}}]\boldsymbol{\beta_{2}} \]

Omitted Variables Bias in Regression

Important

Omitted Variables Bias in Regression is

\[(\textbf{E}[\mathbf{x_{1}'x_{1}}])^{-1} \textbf{E}[\mathbf{x_{1}}'\mathbf{x_{2}}]\boldsymbol{\beta_{2}} \]

OVB is a function of two things
- The relationship between \(\mathbf{x_{2}}\) and \(\mathbf{x_{1}}\)
- The relationship between \(\mathbf{x_{2}}\) and \(y\)
If either of them is zero, there is no OVB
The sign of the bias depends on the sign of their product
- If they have the same sign, bias is positive
- If they have opposite signs, bias is negative