# The Linear Regression Model

EC655

Wilfrid Laurier University

Fall 2022

# Introduction

## Introduction

• Questions in economics often involve explaining a variable in terms of others

• How does age of school entry affect test scores?

• Does childhood health insurance affect adult health?

• Does foreign competition affect domestic innovation?

• Often we are interested in the causal relationship

• The independent effect of one variable on another

• Causal relationships are important for policy

• Econometrics provides a framework for examining these relationships

• Strong focus on causality

• We discussed this and will revisit it

# What Are We Trying To Model?

## Conditional Expectation Function

• As noted above, we want to relate dependent variable $y$ to independent variables $\mathbf{x}$

• Specifically want to know systematically what happens to $y$ when $\mathbf{x}$ changes

• Difficult because $y$ and $\mathbf{x}$ are random variables

• $y$ can take many different values for each $\mathbf{x}$

• This randomness makes it difficult to see relationships

• One way to see a systematic pattern is to focus on average $y$ at each $\mathbf{x}$

• Does $y$ change on average as we increase $\mathbf{x}$?

• Ex: Does academic achievement fall on average as we increase class size?

• Mathematically, this is the conditional expectation $\mathbf{E}[y|\mathbf{x}]$

## Conditional Expectation Function

• Idea is illustrated in figure

• Log earnings on vertical axis, years of schooling on horizontal

• Grey shaded areas are distribution of log earnings at each level of schooling

• Big spread incomes for each level of schooling

• Hard to see relationship

• Black line is the conditional mean of earnings at each level of schooling

• Increasing pattern between school and earnings is much easier to see

• Note how it is not linear

## Conditional Expectation Function

• The Conditional Expectation Function (CEF) highlights the pattern through the randomness

• It is therefore appealing as a way to measure systematic relationships

• It is also the optimal predictor of $y$ given $\mathbf{x}$

• It minimizes the mean squared error in predicting $y$
• We would therefore like to use the CEF to measure relationships between $\mathbf{E}[y|\mathbf{x}]$

• Problem: as a population value, it is not known

• It is not observable because we do not see the population

• Therefore cannot say anything about its value or functional form

## Conditional Expectation Function

• We can use linear regression to approximate CEF

• This approximation is justified in several ways

• If the CEF is linear, it is equivalent to population regression function

• The population regression function is the best linear predictor of $y$ given $\mathbf{x}$

• The population regression function is the best linear approximation to the CEF

• This is partly why linear regression is so popular in economics

• Next section examines the population regression function

• Derive the population slope before thinking about samples

• This derivation will probably be new to you

# The Population Regression Model

## Model

• A linear model relating $y$ to one or more explanatory variables $\mathbf{x}$ is

$y = \mathbf{x}\boldsymbol{\beta} + u$

• Where

• $y$ is a scalar observable random outcome variable

• $\mathbf{x}$ is a $1\times (k + 1)$ vector of random explanatory factors

• $\boldsymbol{\beta}$ is a $(k + 1) \times 1$ vector of slope parameters (non-random)

• $u$ is a scalar population residual term

• This is our model for the (unobserved) population

• Sometimes called the Data Generating Process (DGP)
• $\mathbf{x}\boldsymbol{\beta}$ is called the Population Regression Function (PRF)

• The part of $y$ that is predictable by $\mathbf{x}$

## Model

• Recall we are using it to approximate the CEF

• Goal is not necessarily to get approximation exactly right

• But to capture essential features of relationship

• In undergrad courses it is typical to just assume the CEF is linear

• This is not necessarily true

• But avoids complications of non-linear CEF

• In some cases the CEF is inherently linear

• In last section of the course, we saw the CEF for a binary treatment

• This type of CEF is linear, so it equals the PRF

## Population Regression Slope Vector

• In undergrad it is typical to next estimate $\boldsymbol{\beta}$ with a sample

• You can also derive a population least squares vector

• It is the slope that minimizes the mean squared prediction error (MSPE)

$\min_\beta \textbf{E}[(y-\mathbf{x}\boldsymbol{\beta})^2]$

• If you take the derivative with respect to $\boldsymbol{\beta}$, you get

$\textbf{E}[\mathbf{x}'(y-\mathbf{x}\boldsymbol{\beta})]= \textbf{E}[\mathbf{x}'u]=\mathbf{0}$

## Population Regression Slope Vector

• Solving for $\boldsymbol{\beta}$, we get $\textbf{E}[\mathbf{x}'(y-\mathbf{x}\boldsymbol{\beta})]= \mathbf{0}$ $\textbf{E}[\mathbf{x}'y]= \textbf{E}[\mathbf{x'x}\boldsymbol{\beta}]$ $\textbf{E}[\mathbf{x}'y]= \textbf{E}[\mathbf{x'x}]\boldsymbol{\beta}$ $(\textbf{E}[\mathbf{x'x}])^{-1} \textbf{E}[\mathbf{x}'y]= \boldsymbol{\beta}$

• This is the formula for the slope in the PRF

• With a single $x$, it is the population covariance between $y$ and $x$ divided by the population variance of $x$
• This is the same least squares process you use to get OLS estimator

• Except we are doing it with the population instead of a sample

## Population Regression Slope Vector

• Minimizing the MSPE implies that

$\textbf{E}[\mathbf{x}'u]=\mathbf{0}$

• Expanding that equation, we get

$\begin{bmatrix} \textbf{E}(u)\\ \textbf{E}(x_{1}u)\\ \vdots\\ \textbf{E}(x_{k}u) \end{bmatrix} =\mathbf{0}$

• This says the following important things

• The average value of the population residual $u$ is zero

• The covariance between each $x$ and $u$ is zero

## Population Regression Slope Vector

• Note that

$\text{cov}(x_{1},u) = \mathbf{E}[(x_{1} - \mathbf{E}(x_{1}))(u - \mathbf{E}(u))]$

• From above, we know that $\mathbf{E}(u) =0$, so

$\text{cov}(x_{1},u) = \mathbf{E}[x_{1}u - \mathbf{E}(x_{1})u]$

• Bringing the expectation through the brackets

$\text{cov}(x_{1},u) = \mathbf{E}(x_{1}u) - \mathbf{E}(x_{1})\mathbf{E}(u) = \mathbf{E}(x_{1}u)$

• Says that $u$ is mean zero and uncorrelated with $\mathbf{x}$

## Population Regression Slope Vector

• If we observed the population we could compute $\boldsymbol{\beta}$

• Problem again is we do not observe the population

• So we cannot compute $\textbf{E}[\mathbf{x}'y]$ or $(\textbf{E}[\mathbf{x'x}])^{-1}$

• Instead, we collect a sample of data and estimate $\boldsymbol{\beta}$

• Before we do that, we briefly discuss causality in regression models

# Regression and Causality

## Why is Causality Important?

• Empirical economists are often interested in a causal effect

• For policy, it is often key to have estimate causal effect

• E.g. a school district looking to implement pre-kindergarten program

• This is generally funded with public money

• Need to know if pre-k has independent effects on current and future outcomes

• Do not want this estimate confounded with parent background
• When can we interpret a regression slope as causal?

• Answer: when the model is structural

• Structural model is one where the coefficients have a causal interpretation

## Model with One Binary Regressor

• In the last section we defined the underlying potential outcomes as

$y_{0} = \alpha + \eta$ $y_{1} = y_{0} + \rho$

• With the observed outcome

$y = \alpha + \rho w + \eta$

• This regression model is structural because $\rho$ is the causal effect
• We derived that the difference in conditional expectations is

$E(y|w=1) - E(y|w=0) = \rho + E(\eta |w=1) - E(\eta |w=0)$

## Model with One Binary Regressor

• The population regression function with a binary regressor is

$y = \beta_{0} + \beta_{1}w + u$

• The population least squares slope $\beta_{1}$ from minimizing the MSPE is

$\beta_{1} = E(y|w=1) - E(y|w=0)$

• Combining this equation with the structural model

$\beta_{1} = \rho + E(\eta |w=1) - E(\eta |w=0)$

## Model with One Binary Regressor

• The regression slope $\beta_{1}$ equals the treatment effect $\rho$ when

$E(\eta |w=1) - E(\eta |w=0)$

• We saw cases when this is true

• Randomization

• Mean independence of $\eta$

• If none of these are true, then $\beta_{1} \neq \rho$ and $\beta_{1}$ is not a causal effect

## Model with Continuous Regressor

• With a continuous independent variable $s$, suppose the structural model is

$y = \alpha + \rho s + \eta$

• Where the definition of $\rho$ is

$\rho = E(y_{s_{0}}|s=s_{0}) - E(y_{s_{0}-1}|s = s_{0} - 1)$

• Where $y_{s_{0}}$ and $y_{s_{0}-1}$ are potential outcomes with two different levels of $s$

• $\rho$ is the causal effect of a one-unit increase in $s$
• If we set the population regression function as

$y = \beta_{0} + \beta_{1}s + u$

## Model with Continuous Regressor

• The regression slope is

$\beta_{1} = \frac{cov(y,s)}{var(s)}$

• To relate $\beta_{1}$ to $\rho$, sub in the structural model for $y$

$\beta_{1} = \frac{cov(\alpha + \rho s + \eta ,s)}{var(s)}$

• Simplifying we get

$\beta_{1} = \rho + \frac{cov(\eta ,s)}{var(s)}$

## Model with Continuous Regressor

• $\beta_{1}$ equals $\rho$ when $\eta$ and $s$ are uncorrelated

• Randomization, mean independence both mean this is true
• So if we assume

$E(\eta | s) = 0$

• Then the second term in equation above is zero and the population slope is the causal effect

## Model with Continuous Regressor

• Now imagine that the structural model is

$y = \alpha + \rho s + \gamma x + \eta$

• The definition of $\rho$ is

$\rho = E(y_{s_{0}}|x, s=s_{0}) - E(y_{s_{0}-1}|x, s = s_{0} - 1)$

• If we set the population regression function as

$y = \beta_{0} + \beta_{1}s + \beta_{2} x+ u$

## Model with Continuous Regressor

• Then $\beta_{1}$ equals $\rho$ if we assume

• Conditional independence of $\eta$

• Conditional mean independence of $\eta$

• Conditional mean independence means

$E(\eta | s, x) = E(\eta | x)$

• In words, this means $s$ is related to potential outcomes only through $x$

• So holding $x$ constant breaks this relationship
• Even though $\beta_{1}$ equals $\rho$, it is important to note that $\beta_{0} \neq \alpha$ and $\beta_{2} \neq \gamma$

• With regression we do not measure the structural intercept or effect of $x$

## Model with Continuous Regressor

• To see this, take expectation of $y$ in structural model

$E[y|s,x] = \alpha + \rho s + \gamma x + E[\eta|s,x]$

• If we impose conditional mean independence, then $E(\eta | s, x) = E(\eta | x)$

$E[y|s,x] = \alpha + \rho s + \gamma x + E[\eta|x]$

• The error is not a function of $s$ anymore, but it is a function of $x$

• For example, suppose

$\eta = \theta_{0} + \theta_{1} x + \epsilon$

## Model with Continuous Regressor

• Assume that $\epsilon$ is just a random error unrelated to $x$ and $s$

• Sub into structural model

$y = \alpha + \rho s + \gamma x + \theta_{0} + \theta_{1} x + \epsilon$ $y = (\alpha +\theta_{0})+ \rho s + (\gamma + \theta_{1})x \epsilon$ $y = \lambda + \rho s + \pi x + \epsilon$

• The intercept and slope on $x$ are now redefined

• The are no longer causal effects
• Slope on $s$ is still the causal effect $\rho$

## Model with Continuous Regressor

• If the regression function is

$y = \beta_{0} + \beta_{1}s + \beta_{2} x+ u$

• Then if $E[\epsilon|s,x] = 0$

$\beta_{0} = \lambda$ $\beta_{1} = \rho$ $\beta_{2} = \pi$

## Omitted Variables Bias

• In the regression model above, what happens if we leave out $x$?

• Continue to assume conditional mean independence

$y = \beta_{0} + \beta_{1}s + u$

• Remember the regression slope is

$\beta_{1} = \frac{cov(y ,s)}{var(s)}$

• Sub in the structural model

$\beta_{1} = \frac{cov(\lambda + \rho s + \pi x + \epsilon ,s)}{var(s)}$

## Omitted Variables Bias

$\beta_{1} = \rho + \pi* \frac{cov( x ,s)}{var(s)} + \frac{cov( \epsilon ,s)}{var(s)}$

• The last term is zero because we assume $\epsilon$ is unrelated to $x$ and $s$

$\beta_{1} = \rho + \pi* \frac{cov( x ,s)}{var(s)}$

• The regression slope does not measure the causal effect in this case

• The bias is

$\pi* \frac{cov( x ,s)}{var(s)}$

## Omitted Variables Bias

• Bias has two parts

• $\pi \rightarrow$ the effect of $x$ on $y$

• $\frac{cov( x ,s)}{var(s)} \rightarrow$ the effect of $s$ on $x$

• If $x$ is related to $y$ and $x$ is related to $s$, we have bias

• Direction of bias depends on signs of each term

• If both positive or both negative $\rightarrow$ positive bias

• If one positive and one negative $\rightarrow$ negative bias

• If either $y$ or $s$ is unrelated to $x$, there is no bias

• In vector notation, restate the structural model as

$y = \mathbf{x_{1}}\boldsymbol{\alpha_{1}} + \mathbf{x_{2}}\boldsymbol{\alpha_{2}} + \eta$

## Omitted Variables Bias

• If we try to approximate it with the population regression function

$y = \mathbf{x_{1}}\boldsymbol{\beta_{1}} + u$

• We get the population regression slope as

$\boldsymbol{\beta_{1}}=\left ( E[\mathbf{x_{1}'x_{1}}\right] )^{-1} E[\mathbf{x_{1}'}y]$

• Sub the structural model into the population slope function

$\boldsymbol{\beta_{1}}=\left ( E[\mathbf{x_{1}'x_{1}}\right] )^{-1} E[\mathbf{x_{1}'}( \mathbf{x_{1}}\boldsymbol{\alpha_{1}} + \mathbf{x_{2}}\boldsymbol{\alpha_{2}} + \eta )]$ $=\left ( E[\mathbf{x_{1}'x_{1}}\right] )^{-1} E[\mathbf{x_{1}'} \mathbf{x_{1}}\boldsymbol{\alpha_{1}} + \mathbf{x_{1}'x_{2}}\boldsymbol{\alpha_{2}} + \mathbf{x_{1}'}\eta ]$ $=\left ( E[\mathbf{x_{1}'x_{1}}\right] )^{-1} E[\mathbf{x_{1}'} \mathbf{x_{1}}]\boldsymbol{\alpha_{1}} + \left ( E[\mathbf{x_{1}'x_{1}}\right] )^{-1}E[\mathbf{x_{1}'x_{2}}]\boldsymbol{\alpha_{2}} + \left ( E[\mathbf{x_{1}'x_{1}}\right] )^{-1}E[\mathbf{x_{1}'}\eta ]$

## Omitted Variables Bias

$=\boldsymbol{\alpha_{1}} + \left ( E[\mathbf{x_{1}'x_{1}}\right] )^{-1}E[\mathbf{x_{1}'x_{2}}]\boldsymbol{\alpha_{2}}$

• The population slope vector on $\mathbf{x_{1}}$ equals the sum of

• The causal slope vector $\boldsymbol{\alpha_{1}}$

• A bias term containing

• the regression of $\mathbf{x_{2}}$ on $\mathbf{x_{1}}$

• the slope on $\mathbf{x_{2}}$ in the structural for $y$

• A key lesson here is that a single omitted variable will bias all population slopes $\boldsymbol{\beta_{1}}$

• Unless it is unrelated to y

• Or it is uncorrelated with all but one included regressor, and that regressor is uncorrelated with the others