Nonlinear Regression Functions

EC295

Justin Smith

Wilfrid Laurier University

Fall 2022

Introduction

In all models we have studied, the regression slope is constant
- When $X$ changes by 1, $Y$ changes by same amount regardless of value of $X$
- Effect of changing $X$ on $Y$ also does not depend on other variables
This is not always the case
- Ex: Effect of experience on wages may depend on level of experience
  - Additional experience matters less for the very experienced
- Ex: Effect of extra bedroom on house price may depend on square footage
  - Might be bigger for larger houses
In this section we extend the model to allow nonlinearities

Introduction

All models we have examined are linear in the $X$ variables
Sometimes this produces a bad fit to the data
Ex: Effect of district income on test scores
- Next slide shows test scores plotted against income
- A linear regression is estimated and plotted on top of data
- Notice line does not fit data pattern well
  - Data takes more of a curved relationship
  - Extra income has bigger effect on test scores when income is low
  - Slope is steep at low income, flatter at high income
We can extend the regression to allow curvature in the line

Introduction

Nonlinear Functions of a Single Independent Variable

Quadratic Functions

Recall graph on previous slide
That line would fit better if curved
- Effect of income on scores depends on income
  - Effect of income is large when income is low
  - Slope falls as income gets bigger
  - Called diminishing marginal returns
The easiest way to allow curvature is a quadratic function
A quadratic relationship between test scores and income is \[TestScore_{i} = \beta_{0} + \beta_{1}Income_{i} + \beta_{2}Income_{i}^2 + u_{i}\]

Quadratic Functions

The population regression function in this model is \[E[TestScore_{i}|Income_{i}] = \beta_{0} + \beta_{1}Income_{i} + \beta_{2}Income_{i}^2\]
This specification allows effect of income on test scores to depend on income
To see this, compute total change in the population regression \[\Delta E[TestScore_{i}|Income_{i}] = \beta_{1}\Delta Income_{i} + \beta_{2}\Delta Income_{i}^2\]
Divide both sides by $\Delta Income_{i}$ \[\frac{ \Delta E[TestScore_{i}|Income_{i}]}{\Delta Income_{i}} = \beta_{1}+ \beta_{2}\frac{\Delta Income_{i}^2}{\Delta Income_{i}}\]

Quadratic Functions

When $\Delta Income_{i}$ is small, we can approximate $\frac{\Delta Income_{i}^2}{\Delta Income_{i}}$ \[\frac{\Delta Income_{i}^2}{\Delta Income_{i}} \approx 2Income_{i}\]
Plugging into the main equation \[\frac{E[TestScore_{i}|Income_{i}]}{\Delta Income_{i}} = \beta_{1}+ 2\beta_{2}Income_{i}\]
The effect of income on test scores has two components
- $\beta_{1}$, the effect of income on test scores when $Income_{i} = 0$
- $2\beta_{2}Income_{i}$, how the effect depends on income

Quadratic Functions

The signs of $\beta_{1}$ and $\beta_{2}$ determine shape of function
- If $\beta_{1} >0$ and $\beta_{2} <0$ it has inverted u shape
  - Effect of income on test scores is positive when income is low, and decreases as income gets larger
  - After certain point, effect becomes negative
- If $\beta_{1} <0$ and $\beta_{2} >0$ it has u shape
  - Effect of income on test scores is negative when income is low, and increases as income gets larger
  - After certain point, effect becomes positive
Estimate this equation by OLS the same way as before
Estimated effect of income on test scores is \[\frac{\Delta \hat{TestScore}_{i}}{\Delta Income_{i}} = \hat{\beta}_{1}+ 2\hat{\beta}_{2}Income_{i}\]

Quadratic Functions

Ex: estimated quadratic regression function \[\hat{TestScore}_{i} = 607.3 + 3.85 Income - 0.0423 Income^2\]
Estimated effect of income on test scores is \[\frac{\Delta\hat{TestScore}_{i}}{\Delta Income_{i}} = 3.85- 2(0.0423)Income_{i}\]

Quadratic Functions

Note how effect depends on level of income
- If income = 20,000 \[\frac{\Delta \hat{TestScore}_{i}}{\Delta Income_{i}} = 3.85- 2(0.0423)(20) = 2.158\]
- If income = 40,000 \[\frac{\Delta \hat{TestScore}_{i}}{\Delta Income_{i}} = 3.85- 2(0.0423)(40) = 0.466\]
- If income = 60,000 \[\frac{\Delta \hat{TestScore}_{i}}{\Delta Income_{i}} = 3.85- 2(0.0423)(60) = -1.226\]

Quadratic Functions

Slope is high at low values of income, then falls as income grows larger
- At very high income, effect is negative
Function illustrated graphically on next slide
Several notable features
- Since $\hat{\beta}_{1} >0$ and $\hat{\beta}_{2} <0$ it has inverted u shape
  - Effect initially steep and positive
  - Flattens as income goes higher
  - Becomes negative after a certain point
- Fits pattern of data much better
  - Especially at the ends
  - Likely $R^2$ would be higher in this regression

Quadratic Functions

Polynomial Functions

We can allow for more flexibility in function with higher powers of $X$
- Ex: cubic, quartic, etc...
- Call these functions polynomials
A polynomial relationship between $Y$ and $X$ of degree r is \[Y_{i} = \beta_{0} + \beta_{1}X_{i} + \beta_{2}X_{i}^2 + ... + \beta_{r}X_{i}^r +u_{i}\]
- The “r” determines highest power in function
- When $r=2$ it is quadratic
- When $r=3$ it is cubic
Higher powers allow more bends in the function
- Can help fit data better
Effect of $X$ on $Y$ is more complicated with higher polynomials

Polynomial Functions

Computing effect can be more difficult with higher powers
Ex: estimated quadratic function \[\hat{Y}_{i} = \hat{\beta}_{0} + \hat{\beta}_{1}X_{i} + \hat{\beta}_{2}X_{i}^2\] \[\frac{\Delta \hat{Y}_{i}}{\Delta X_{i}} = \hat{\beta}_{1}+ 2\hat{\beta}_{2}X_{i}\]
Ex: estimated cubic function \[\hat{Y}_{i} = \hat{\beta}_{0} + \hat{\beta}_{1}X_{i} + \hat{\beta}_{2}X_{i}^2 + \hat{\beta}_{3}X_{i}^3\] \[\frac{\Delta \hat{Y}_{i}}{\Delta X_{i}} = \hat{\beta}_{1}+ 2\hat{\beta}_{2}X_{i} + 3\hat{\beta}_{3}X_{i}^2\]
and so on

Polynomial Functions

Testing Null that the Regression is Linear

We can test between polynomial and linear functions
Procedure is to use an $F$-test as discussed last chapter
First, estimate polynomial \[Y_{i} = \beta_{0} + \beta_{1}X_{i} + \beta_{2}X_{i}^2 + ... + \beta_{r}X_{i}^r +u_{i}\]
Then test between these hypotheses
- $H_{0}: \beta_{2} = 0, \beta_{3} = 0, ..., \beta_{r} = 0$
- $H_{1}: \text{ at least one of }\beta_{2},\beta_{3}, ..., \beta_{r} \text{ is not zero}$
Accepting null means function is linear
Rejecting means it is nonlinear in some way

Choosing the Degree of Polynomial

Higher order polynomials increase flexibility of the regression
But, if irrelevant can increase standard errors
- Multicollinearity will reduce precision of other estimates
Thus, it is important to properly specify degree of polynomial
Textbook suggests following procedure
1. Pick maximum value for $r$ and estimate regression
2. Test null that $\beta_{r}$ equals zero. If reject, keep in regression
3. If accept null in 2, reestimate with polynomial of degree $r-1$
4. Repeat 2-3 until you reject null hypothesis in 2
This is sequentially pruning model to get right specification

Logarithms

Common way to include nonlinearities is with logarithms
What is a logarithm?
- Logarithm of $x$ to base $b$ is the $y$ that solves $b^{y} = x$ \[log_{b}(x) = y\]
- Ex: If $x$ is 1000, and the base $b$ is 10, the logarithm of $x$ is the exponent that turns 10 into 1000 \[log_{10}(1000) = 3\]
- Increasing $log_{10}(x)$ by one unit is like multiplying $x$ by 10
- 10 is a common base, but it can be any number
- Ex: Richter scale (for earthquakes) is a base-10 log scale

Logarithms

Base 10 Logarithm
x	1	10	100	1000	10000	100000
$log_{10}(x)$	0	1	2	3	4	5

Economists often use the natural logarithm
- Logarithm with b equal to $e = 2.71828$
- Often written as $ln(x)$
- Increasing $ln(x)$ by one unit is like multiplying by 2.71828
Natural logarithms have one key useful property
- Small changes in $ln(x)$ approximate a percentage change \[100 \times (ln(x +\Delta x) - ln(x)) =100 \times \Delta ln(x) \approx 100\times \left ( \frac{\Delta x}{x} \right )\]

Logarithms

Ex: let $x = 100$ and $\Delta x = 1$
- Actual percentage change is: $100\times \left (\frac{101 - 100}{100} \right ) = 1$
- Log approximation is $100 \times (ln(101) - ln(100)) = 100 \times (4.615 -4.605) =1$
Useful for expressing $X$, $Y$, or both in percentage terms
- If $X$ changes by 1 unit, what is percentage effect on $Y$?
- If $X$ changes by 1% , what is effect on $Y$?
- If $X$ changes by 1%, what is percentage effect on $Y$?
The inverse of a natural logarithm is the exponential function \[x = e^{ln(x)}\]
- This will become useful in some situations

Log-Linear Model

A log-linear model is when $Y$ is in logs, but $X$ is not \[ln(Y_{i}) = \beta_{0}+\beta_{1}X_{i} + u_{i}\]
Often used when measuring the return to education
A common population regression model in this context is \[ln(wage) = \beta_{0} + \beta_{1}educ + u\]
The interpretation of $\beta_{1}$ is \[\beta_{1} = \frac{\Delta ln(wage)}{\Delta educ}\]
Remember that $100\times \Delta ln(wage)$ is roughly equal to % change in wage

Log-Linear Model

So if we multiply $\beta_{1}$ by 100 then \[100\times \beta_{1} \approx \frac{\% \Delta wage}{\Delta educ}\]
- $100\times \beta_{1}$ measures the % change in wage due to a 1-year increase in education
Appropriate if each year of education has equal proportional change in wage
- Increasing education raises wage by same percentage
In the general model \[100\times \beta_{1} \approx \frac{\% \Delta Y_{i}}{\Delta X_{i}}\]

Linear-Log Model

A linear-log model is when $X$ is in logs, but $Y$ is not \[Y_{i} = \beta_{0}+\beta_{1}ln(X_{i}) + u_{i}\]
One application is the effect of GDP on life expectancy \[expectancy= \beta_{0} + \beta_{1}ln(GDP) + u\]
- Appropriate if proportional changes in GDP have same effect on expectancy
- Larger increases in GDP are needed to get same effect on expectancy as countries get richer
\[\beta_{1} = \frac{\Delta expectancy}{\Delta ln(GDP)}\]

Linear-Log Model

If we divide $\beta_{1}$ by 100 \[\frac{\beta_{1}}{100} = \frac{\Delta expectancy}{100 \times \Delta ln(GDP)} = \frac{\Delta expectancy}{\% \Delta GDP}\]
$\frac{\beta_{1}}{100}$ measures effect of 1% change in GDP on expectancy
In the general model \[\frac{\beta_{1}}{100} = \frac{\Delta Y_{i}}{\% \Delta X_{i}}\]
It is relatively rare to see this model in practice

Log-Log Model

A log-log model is when both $Y$ and $X$ are in logs \[ln(Y_{i}) = \beta_{0}+\beta_{1}ln(X_{i}) + u_{i}\]
This is used to estimate the elasticity between $X$ and $Y$
Suppose you want to know price elasticity of demand for ticket sales \[ln(sales)= \beta_{0} + \beta_{1}ln(price) + u\]
In this case \[\beta_{1} = \frac{\Delta ln(sales)}{\Delta ln(price)}\]

Log-Log Model

If we multiply and divide $\beta_{1}$ by 100 \[\beta_{1} = \frac{100\times \Delta ln(sales)}{100 \times \Delta ln(price)} = \frac{\% \Delta sales}{\% \Delta price}\]
$\beta_{1}$ measures percent effect on ticket sales from a 1% increase in price
- This is the price elasticity of demand for ticket sales
In the general model \[\beta_{1} = \frac{\% \Delta Y_{i}}{\% \Delta X_{i}}\]
Note that you do not need to alter $\beta_{1}$ before interpreting

Visualizing Nonlinear Models

Example with Stata

We continue above with the relationship between test scores and district income
Below we estimate various nonlinear functions of income
- Linear: $TestScore_{i}= \beta_{0} + \beta_{1}Income_{i} + u_{i}$
- Quadratic: $TestScore_{i}= \beta_{0} + \beta_{1}Income_{i} + \beta_{2}Income_{i}^2+ u_{i}$
- Log Linear: $ln(TestScore_{i})= \beta_{0} + \beta_{1}Income_{i} + u_{i}$
- Linear Log: $TestScore_{i}= \beta_{0} + \beta_{1}ln(Income_{i}) + u_{i}$
- Log Log: $ln(TestScore_{i})= \beta_{0} + \beta_{1}ln(Income_{i}) + u_{i}$
We will interpret coefficients in each

Example with Stata

First, create necessary variables
Then estimate regressions

**Create variables we need for the regression;

    gen avginc2 = avginc^2;

    gen lntestscr = ln(testscr);
    gen lnavginc = ln(avginc);

**Estimate regressions;
    
    regress testscr avginc;
        estimates store linear;
        
    regress testscr avginc avginc2;
        estimates store quadratic;
        
    regress lntestscr avginc;
        estimates store loglinear;
        
    regress testscr lnavginc;
        estimates store linearlog;
        
    regress lntestscr lnavginc;
        estimates store loglog;

Example with Stata

Export results in easy to read table

.  estout linear quadratic linearlog loglinear loglog, cells(b(star fmt(3)) 
    se(par fmt(3))) starlevels(* 0.05 ** 0.01 ) stats(r2 r2_a rmse N,fmt(3 3 3 0 ));

---------------------------------------------------------------------------------------
                   linear      quadratic      linearlog      loglinear         loglog  
                     b/se           b/se           b/se           b/se           b/se  
---------------------------------------------------------------------------------------
avginc              1.879**        3.851**                       0.003**               
                  (0.091)        (0.304)                       (0.000)                 
avginc2                           -0.042**                                             
                                 (0.006)                                               
lnavginc                                         36.420**                       0.055**
                                                (1.571)                       (0.002)  
_cons             625.384**      607.302**      557.832**        6.439**        6.336**
                  (1.532)        (3.046)        (4.200)        (0.002)        (0.006)  
---------------------------------------------------------------------------------------
r2                  0.508          0.556          0.563          0.498          0.558  
r2_a                0.506          0.554          0.561          0.497          0.557  
rmse               13.387         12.724         12.618          0.021          0.019  
N                     420            420            420            420            420  
---------------------------------------------------------------------------------------

Example with Stata

Linear model
- Income increases by $1000, test scores increase by 1.879 points
Quadratic model
- $\frac{\Delta TestScore}{\Delta Income} = 3.851-2(0.042)Income$
- When Income = 0, if income increases by $1000, test scores increase by 3.851 points
- $\frac{\Delta TestScore}{\Delta Income}$ is smaller by 2(0.042) with each $1000 in income
- When Income = 50000, if income increases by $1000, test scores fall by 3.851-2(0.042)50 = -0.349 points
Linear Log model
- $\frac{\beta_{1}}{100} = \frac{\Delta TestScore}{\% \Delta Income}$
- $\frac{\beta_{1}}{100} = \frac{36.420}{100} = 0.364$
- Income increases by 1%, test scores rise by 0.364 points

Example with Stata

Log Linear model
- $100\beta_{1} = \frac{\% \Delta TestScore}{\Delta Income}$
- $100\beta_{1} = 100(0.003) = 0.3$
- Income increases by $1000, test scores rise by 0.3%
Log Log model
- $\beta_{1} = \frac{\% \Delta TestScore}{\% \Delta Income}$
- $\beta_{1} = 0.055$
- Income increases by 1%, test scores rise by 0.055%
- This is the elasticity of test scores with respect to income

Interactions Between Variables

Introduction

Quadratics allowed effect of $X$ on $Y$ to depend on $X$
- Ex: Effect of experience on wages depends on current experience
We can also allow effect of $X$ on $Y$ to depend on third variable $Z$
- Ex: Effect of experience on wages may depend on education
- Ex: Effect of experience on wages may depend on gender
We allow for this dependence with interaction terms
- The product of two independent variables
In this section we study variable interactions
- Between dummies and continuous variables
- Also between continuous variables

Model with Multiple Dummy Variables

First, we examine a model with two dummy variables but no interaction
Suppose we are interested in the effect of gender and marriage on wages \[wage_{i} = \beta_{0} + \beta_{1}fem_{i} + \beta_{2}marr_{i} + u_{i}\]
In this model,
- $fem_{i} = 1$ for women, and zero otherwise
- $marr_{i}=1$ if married, and zero otherwise
How do we interpret the parameters of this model?

Model with Multiple Dummy Variables

As before, look at population regression function
- When $fem_{i} =1$ and $marr_{i} = 1$ \[E[wage_{i}|fem_{i}=1,marr_{i}=1] = \beta_{0} + \beta_{1}+ \beta_{2}\]
- When $fem_{i} =1$ and $marr_{i} = 0$ \[E[wage_{i}|fem_{i}=1,marr_{i}=0] = \beta_{0} + \beta_{1}\]
- When $fem_{i} =0$ and $marr_{i} = 1$ \[E[wage_{i}|fem_{i}=0,marr_{i}=1] = \beta_{0} + \beta_{2}\]
- When $fem_{i} =0$ and $marr_{i} = 0$ \[E[wage_{i}|fem_{i}=0,marr_{i}=0] = \beta_{0}\]

Model with Multiple Dummy Variables

$\beta_{0}$ is average wage when male, unmarried \[E[wage_{i}|fem_{i}=0,marr_{i}=0] = \beta_{0}\]
$\beta_{1}$ is difference between unmarried woman and unmarried man \[E[wage_{i}|fem_{i}=1,marr_{i}=0] - E[wage_{i}|fem_{i}=0,marr_{i}=0]\] \[= \beta_{0} + \beta_{1} - \beta_{0}\] \[= \beta_{1}\]
$\beta_{1}$ also difference between married woman and married man \[E[wage_{i}|fem_{i}=1,marr_{i}=1] - E[wage_{i}|fem_{i}=0,marr_{i}=1]\] \[= \beta_{0} + \beta_{1}+ \beta_{2} - (\beta_{0} + \beta_{2})\] \[= \beta_{1}\]

Model with Multiple Dummy Variables

Model imposes that female-male difference does not depend on marriage
- It is equal to $\beta_{1}$ regardless of marital status
$\beta_{2}$ is married-unmarried difference
- Like before, difference does not depend on gender
\[E[wage_{i}|fem_{i}=0,marr_{i}=1] - E[wage_{i}|fem_{i}=0,marr_{i}=0]\] \[= \beta_{0} + \beta_{2} - \beta_{0}\] \[= \beta_{2}\] \[E[wage_{i}|fem_{i}=1,marr_{i}=1] - E[wage_{i}|fem_{i}=1,marr_{i}=0]\] \[= \beta_{0} + \beta_{1}+ \beta_{2} - (\beta_{0} + \beta_{1})\] \[= \beta_{2}\]

Model with Multiple Dummy Variables

In total model has four intercepts
- married women: $\beta_{0} + \beta_{1}+ \beta_{2}$
- unmarried women: $\beta_{0} + \beta_{1}$
- married men: $\beta_{0} + \beta_{2}$
- unmarried men: $\beta_{0}$
Key limitation is female/male differences do not depend on marriage
Also, married/unmarried differences do not depend on gender
Both of these restrictions are specific to this model
- We can relax them with interaction terms

Example with Stata

As an example, we study the determinants of wages
We are using the 2011 Survey of Labour and Income Dynamics (SLID)
- Canadian survey on workers in Canada
- Information on earnings, work hours, demographics, jobs
- Follows people across time
- We use the 2011 survey only
There are about 47,000 observations total
- We use a subset of about 16,000
Drop a few values due to missing information on education
The datafile and dofile are posted to mylearningspace
- Datafile: slid.dta
- Dofile: EC295 nonlinear regression part 2.do

Example with Stata

Suppose we regress wages on a female and married dummy \[wage_{i} = \beta_{0} + \beta_{1}fem_{i}+ \beta_{2}marr_{i} + u_{i}\]

Linear regression                               Number of obs     =     15,817
                                                F(2, 15814)       =     466.21
                                                Prob > F          =     0.0000
                                                R-squared         =     0.0590
                                                Root MSE          =     13.432

------------------------------------------------------------------------------
             |               Robust
    cmphrw28 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      female |  -4.245531   .2110128   -20.12   0.000     -4.65914   -3.831921
     married |   5.093432   .2080484    24.48   0.000     4.685633    5.501231
       _cons |   25.74684   .1798452   143.16   0.000     25.39432    26.09935
------------------------------------------------------------------------------

Example with Stata

Based on these estimates
- Women earn $4.25 less than men, on average
- Married people earn $5.09 more than unmarried, on average
- Different intercepts are
  - Unmarried men: $\hat{\beta}_{0} = 25.75$
  - Unmarried women: $\hat{\beta}_{0} + \hat{\beta}_{1} = 25.75-4.25 = 21.50$
  - Married men: $\hat{\beta}_{0} + \hat{\beta}_{2} = 25.75+5.09 = 30.84$
  - Married women: $\hat{\beta}_{0}+ \hat{\beta}_{1} +\hat{\beta}_{2}= 25.75- 4.25+5.09 = 26.59$
- Notice female/male difference is $-4.25$ regardless of marital status
- Also married/unmarried difference is $5.09$ regardless of gender

Interactions Among Dummy Variables

Recall the model \[wage_{i} = \beta_{0} + \beta_{1}fem_{i} +\beta_{2}marr_{i} + u_{i}\]
- The female-male difference is measured by $\beta_{1}$
- The married-unmarried difference is measured by $\beta_{2}$
A key limitation of this model was that
- The female-male difference did not depend on marital status
- The married-unmarried difference did not depend on gender
Interacting the two variables changes that interpretation

Interactions Among Dummy Variables

Suppose we add an interaction between $fem_{i}$ and $marr_{i}$ \[wage_{i} = \beta_{0} + \beta_{1}fem_{i} +\beta_{2}marr_{i} +\beta_{3}(fem_{i}\times marr_{i}) + u_{i}\]
Now suppose we measure the female-male difference

\[E[wage_{i}|marr_{i} , fem_{i} = 1] - E[wage_{i}|marr_{i} , fem_{i} =0]\] \[= ( \beta_{0} + \beta_{1} + \beta_{2}marr_{i} +\beta_{3}marr_{i} )\] \[- (\beta_{0} + \beta_{2}marr_{i} )\] \[= \beta_{1} + \beta_{3}marr_{i}\]
Now, the female-male gap depends on the marital status
- The gap is different depending on value of $marr_{i}$

Interactions Among Dummy Variables

To see this, compute the female-male gap for unmarried and married
- For unmarried people \[E[wage_{i}|fem_{i}=1,marr_{i}=0] - E[wage_{i}|fem_{i}=0,marr_{i}=0]\] \[= \beta_{0} + \beta_{1} - \beta_{0}\] \[= \beta_{1}\]
- For married people \[E[wage_{i}|fem_{i}=1,marr_{i}=1] - E[wage_{i}|fem_{i}=0,marr_{i}=1]\] \[= \beta_{0} + \beta_{1}+ \beta_{2} + \beta_{3} - (\beta_{0} + \beta_{2})\] \[= \beta_{1} + \beta_{3}\]

Interactions Among Dummy Variables

Likewise, married unmarried gap depends on gender \[E[wage_{i}|marr_{i} = 1 , fem_{i} ] - E[wage_{i}|marr_{i}=0 , fem_{i}]\] \[=( \beta_{0} + \beta_{1}fem_{i} + \beta_{2} +\beta_{3}fem_{i} ) - (\beta_{0} + \beta_{1}fem_{i} )\]

\[= \beta_{2}+\beta_{3}fem_{i}\]
- marriage gap for men is \[E[wage_{i}|marr_{i} = 1 , fem_{i} = 0 ] - E[wage_{i}|marr_{i}=0 , fem_{i} = 0]\] \[= \beta_{2}\]
- Marriage gap for women is \[E[wage_{i}|marr_{i} = 1 , fem_{i} = 1 ] - E[wage_{i}|marr_{i}=0 , fem_{i} = 1]\] \[= \beta_{2} + \beta_{3}\]

Interactions Among Dummy Variables

To summarize, interacting dummies allows for
- Different intercepts for all groups
- Differences in intercepts depend on group status
The different intercepts of this model are
- Unmarried men: $\beta_{0}$
- Unmarried women: $\beta_{0} + \beta_{1}$
- Married men: $\beta_{0} + \beta_{2}$
- Married women: $\beta_{0} + \beta_{1} + \beta_{2}+ \beta_{3}$

Interactions Among Dummy Variables

The interpretations for each are as follows
- $\beta_{1}$: difference between unmarried women and unmarried men
- $\beta_{2}$: difference between married men and unmarried men
- $\beta_{3}$: more complicated
  - Gender gap for married people minus gender gap for unmarried people
  - Marriage gap for women minus marriage gap for men
  - Measures how gender gap depends on marriage, and how marriage gap depends on gender
  - It is sometimes called the difference in differences

Example with Stata

Suppose we interact female and married \[wage_{i} = \beta_{0} + \beta_{1}fem_{i} +\beta_{2}marr_{i} +\beta_{3}(fem_{i}\times marr_{i}) + u_{i}\]

.  gen female_married = female*married;
.  regress cmphrw28 female married female_married, robust  ;


Linear regression                               Number of obs     =     15,817
                                                F(3, 15813)       =     313.66
                                                Prob > F          =     0.0000
                                                R-squared         =     0.0645
                                                Root MSE          =     13.393

--------------------------------------------------------------------------------
               |               Robust
      cmphrw28 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
        female |  -1.911665   .2772196    -6.90   0.000    -2.455047   -1.368283
       married |   7.098604   .3130463    22.68   0.000     6.484998    7.712211
female_married |  -4.143398    .412491   -10.04   0.000    -4.951927   -3.334868
         _cons |   24.57583   .2062195   119.17   0.000     24.17162    24.98005
--------------------------------------------------------------------------------

Example with Stata

In this model,
- female-male wage gap is $\beta_{1}+ \beta_{3}marr_{i}$
- married-unmarried wage gap is $\beta_{2}+ \beta_{3}fem_{i}$
Based on these results
- $\beta_{1} = -1.92$ wage gap between female and male unmarried people
- $\beta_{2} = 7.09$: wage gap between married and unmarried men
- $\beta_{3} = -4.14$:
  - difference in gender gap between married and unmarried
  - It is also difference in marriage gap between women and men
Note how $\beta_{3}$ measures change in gender gap for married, and also change in marriage gap for women

Models with Dummy and Continuous Variables

Suppose we are interested in estimating

\[wage_{i} = \beta_{0} + \beta_{1}fem_{i} + \beta_{2}educ_{i} + u_{i}\]
- $wage_{i}$ is an individual’s wage in dollars
- $fem_{i}$ is a dummy variable equal to 1 if female, 0 if male
- $educ_{i}$ is total years of schooling
The model contains one dummy, one continuous independent variable
To understand interpretation of $\beta_{1}$, take expected value \[E[wage_{i}|fem_{i},educ_{i}] = \beta_{0} + \beta_{1}fem_{i} + \beta_{2}educ_{i}\]

Models with Dummy and Continuous Variables

When $fem_{i}$ = 1 \[E[wage_{i}|fem_{i} = 1,educ_{i}] = \beta_{0} + \beta_{1}+ \beta_{2}educ_{i}\]
When $fem_{i}$= 0 \[E[wage_{i}|fem_{i} = 0,educ_{i}] = \beta_{0} + \beta_{2}educ_{i}\]
Taking the difference \[E[wage_{i}|fem_{i} = 1,educ_{i}] - E[wage_{i}|fem_{i} = 0,educ_{i}] =\] \[(\beta_{0} + \beta_{1}+ \beta_{2}educ_{i}) - (\beta_{0} + \beta_{2}educ_{i} ) =\beta_{1}\]
$\beta_{1}$ is difference in average wage between women and men \[\beta_{1} = E[wage_{i}|fem_{i} = 1,educ_{i}] - E[wage_{i}|fem_{i} = 0,educ_{i}]\]

Models with Dummy and Continuous Variables

Notice that in expectation, we are holding education constant
- Level of education is same in both expectations
- $\beta_{1}$ measures gender difference at each level of education
Can view $\beta_{1}$ as an intercept shift
- Intercept $\beta_{0}$ is $wage$ when independent variables are zero
  
  \[E[wage_{i}|fem_{i} = 0,educ_{i}=0] = \beta_{0}\]
- In this model, it is average wages of male with zero education
- Average wage of woman with no education is \[E[wage_{i}|fem_{i} = 1,educ_{i}=0] = \beta_{0} + \beta_{1}\]
- So, $\beta_{1}$ measures how average wage with no education changes between women and men
  - This allows different intercepts between men and women

Models with Dummy and Continuous Variables

Interactions Between Continuous and Dummy Variables

We just saw how to allow for intercepts between groups
- Adding dummy variable for the group
We can also allow for the slopes to depend on groups
Suppose we interact female with educ \[wage_{i} = \beta_{0} + \beta_{1}fem_{i} + \beta_{2}educ_{i} +\beta_{3}(fem_{i} \times educ_{i})+ u_{i}\]
The effect of $educ_{i}$ on $E[wage_{i}|fem_{i},educ_{i}]$ is measured by

\[\frac{\Delta E[wage_{i}|fem_{i},educ_{i}] }{\Delta educ_{i}} = \beta_{2} + \beta_{3}fem_{i}\]

Interactions Between Continuous and Dummy Variables

In this setup
- $\beta_{2}$ measures effect of $educ_{i}$ on $wage_{i}$ when $fem_{i} = 0$
  - Slope between educ and wage for men
- $\beta_{2}+\beta_{3}$ measures effect of $educ_{i}$ on $wage_{i}$ when $fem_{i} = 1$
  - Slope between educ and wage for women
- $\beta_{3}$ is difference in slope between women and men
Interacting a dummy with continuous variable allows for different slopes between groups
With a dummy variable on its own, and interacted with continuous variable, we allow different intercept and slope
- Dummy allows for different intercept
- Interaction with dummy allows for different slope

Interactions Between Continuous and Dummy Variables

In rare cases, you can also estimate a model with the same intercept, but different slopes \[wage_{i} = \beta_{0} + \beta_{2}educ_{i} +\beta_{3}(fem_{i} \times educ_{i})+ u_{i}\]
- Excludes the dummy variable on its own
- This eliminates the intercept shift
In this setup
- $\beta_{0}$ is the intercept for men and women
- $\beta_{2}$ measures slope on $educ_{i}$ for men
- $\beta_{2}+\beta_{3}$ measures slope on $educ_{i}$ for women
- $\beta_{3}$ is difference in slope between women and men

Interactions Between Continuous and Dummy Variables

Example with Stata

A model with intercept shift but same slopes would be \[wage_{i} = \beta_{0} + \beta_{1}fem_{i} + \beta_{2}educ_{i} + u_{i}\]

.  regress cmphrw28 female yrschl18, robust  ;

Linear regression                               Number of obs     =     15,817
                                                F(2, 15814)       =    1342.07
                                                Prob > F          =     0.0000
                                                R-squared         =     0.1918
                                                Root MSE          =     12.448

------------------------------------------------------------------------------
             |               Robust
    cmphrw28 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      female |  -5.310934   .1987372   -26.72   0.000    -5.700481   -4.921386
    yrschl18 |   2.078491   .0424539    48.96   0.000     1.995277    2.161706
       _cons |   .3170835   .5522942     0.57   0.566    -.7654761    1.399643
------------------------------------------------------------------------------

Example with Stata

A model with intercept shift and different slopes is \[wage_{i} = \beta_{0} + \beta_{1}fem_{i} + \beta_{2}educ_{i} +\beta_{3}(fem_{i} \times educ_{i})+ u_{i}\]

. gen female_yrschl18 = female*yrschl18;
.  regress cmphrw28 female yrschl18 female_yrschl18, robust  ;

Linear regression                               Number of obs     =     15,817
                                                F(3, 15813)       =    1030.03
                                                Prob > F          =     0.0000
                                                R-squared         =     0.1918
                                                Root MSE          =     12.449

---------------------------------------------------------------------------------
                |               Robust
       cmphrw28 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
         female |   -5.43323   1.093145    -4.97   0.000     -7.57592   -3.290541
       yrschl18 |   2.074593   .0628544    33.01   0.000     1.951391    2.197794
female_yrschl18 |   .0088005   .0830051     0.11   0.916     -.153899    .1715001
          _cons |   .3703632    .813796     0.46   0.649     -1.22477    1.965496
---------------------------------------------------------------------------------

Example with Stata

A model with same intercepts but different slopes is \[wage_{i} = \beta_{0} + \beta_{1}fem_{i} + \beta_{2}educ_{i} +\beta_{3}(fem_{i} \times educ_{i})+ u_{i}\]

. regress cmphrw28 yrschl18 female_yrschl18, robust  ;

Linear regression                               Number of obs     =     15,817
                                                F(2, 15814)       =    1266.52
                                                Prob > F          =     0.0000
                                                R-squared         =     0.1904
                                                Root MSE          =     12.459

---------------------------------------------------------------------------------
                |               Robust
       cmphrw28 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
       yrschl18 |   2.238699   .0447701    50.00   0.000     2.150944    2.326454
female_yrschl18 |  -.3678847   .0152029   -24.20   0.000    -.3976841   -.3380852
          _cons |  -1.966818   .5616236    -3.50   0.000    -3.067664   -.8659719
---------------------------------------------------------------------------------

Interactions Between Continuous Variables

The final type of interaction is between two continuous variables
Suppose we are interested in \[wage_{i} = \beta_{0} + \beta_{1}exper_{i} + \beta_{2}educ_{i}+ \beta_{3}(exper_{i} \times educ_{i}) + u_{i}\]
- $wage_{i}$ is an individual’s wage in dollars
- $exper_{i}$ is experience in years
- $educ_{i}$ is total years of schooling
This model allows effect of experience to depend on education
- Also effect of education to depend on experience

Interactions Between Continuous Variables

To see this, compute change in wage with experience \[\frac{\Delta E[wage_{i}|exper_{i},educ_{i}] }{\Delta exper_{i}} = \beta_{1} + \beta_{3}educ_{i}\]
There are two components to the effect
- $\beta_{1}$: effect of experience on wages when education is zero
- $\beta_{3}$: change in the effect with each additional year of education
The effect of education on wages is \[\frac{\Delta E[wage_{i}|exper_{i},educ_{i}] }{\Delta educ_{i}} = \beta_{2} + \beta_{3}exper_{i}\]
- $\beta_{2}$: effect of education on wages when experience is zero
- $\beta_{3}$: change in the effect with each additional year of experience

Example with Stata

A model with interactions between experience and earnings is \[wage_{i} = \beta_{0} + \beta_{1}exper_{i} + \beta_{2}educ_{i}+ \beta_{3}(exper_{i} \times educ_{i}) + u_{i}\]

.  gen yrxfte11_yrschl18 = yrxfte11*yrschl18;
.  regress cmphrw28 yrxfte11 yrschl18 yrxfte11_yrschl18, robust  ;

Linear regression                               Number of obs     =     15,817
                                                F(3, 15813)       =    1094.72
                                                Prob > F          =     0.0000
                                                R-squared         =     0.2265
                                                Root MSE          =     12.179

-----------------------------------------------------------------------------------
                  |               Robust
         cmphrw28 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------+----------------------------------------------------------------
         yrxfte11 |  -.1525336   .0462564    -3.30   0.001    -.2432013   -.0618658
         yrschl18 |   1.531803   .0675118    22.69   0.000     1.399472    1.664134
yrxfte11_yrschl18 |    .033486   .0036474     9.18   0.000     .0263366    .0406355
            _cons |  -.5684791   .9001661    -0.63   0.528    -2.332907    1.195949
-----------------------------------------------------------------------------------

Example with Stata

Here are all the results from the regressions together

 estout dummy dummyinteract intershift interslopeshift slopeshift continteract, 
 cells(b(star fmt(3)) se(par fmt(3))) starlevels(* 0.05 ** 0.01 ) stats(r2 r2_a rmse N,fmt(3 3 3 0 ));

------------------------------------------------------------------------------------------------------
                    dummy   dummyinter~t     intershift   interslope~t     slopeshift   continteract  
                     b/se           b/se           b/se           b/se           b/se           b/se  
------------------------------------------------------------------------------------------------------
female             -4.246**       -1.912**       -5.311**       -5.433**                              
                  (0.211)        (0.277)        (0.199)        (1.093)                                
married             5.093**        7.099**                                                            
                  (0.208)        (0.313)                                                              
female_mar~d                      -4.143**                                                            
                                 (0.412)                                                              
yrschl18                                          2.078**        2.075**        2.239**        1.532**
                                                (0.042)        (0.063)        (0.045)        (0.068)  
female_yr~18                                                     0.009         -0.368**               
                                                               (0.083)        (0.015)                 
yrxfte11                                                                                      -0.153**
                                                                                             (0.046)  
yrxfte11_~18                                                                                   0.033**
                                                                                             (0.004)  
_cons              25.747**       24.576**        0.317          0.370         -1.967**       -0.568  
                  (0.180)        (0.206)        (0.552)        (0.814)        (0.562)        (0.900)  
------------------------------------------------------------------------------------------------------
r2                  0.059          0.064          0.192          0.192          0.190          0.226  
r2_a                0.059          0.064          0.192          0.192          0.190          0.226  
rmse               13.432         13.393         12.448         12.449         12.459         12.179  
N                   15817          15817          15817          15817          15817          15817  
------------------------------------------------------------------------------------------------------

Example with Stata

Model 1 allows for different intercepts for women and married people
- Gender gap does not depend on marriage; marriage premium does not depend on gender
- Women earn $4.25 less than men, married people earn $5.09 more than non-married people
Model 2 allows gender gap to depend on marriage; marriage gap to depend on gender
- Gender gap for unmarried is -$1.91
- Gender gap for married people is wider, equal to $-1.91 - $4.14
Model 3 estimates returns to schooling, allows intercept shift for women
- Wages for women are $5.43 lower regardless of schooling
- Return to schooling is $2.08 for women and men
Model 4 allows intercept and slope to differ for women
- Wages for women are $5.31 lower regardless of schooling
- Return to schooling is $2.08 for men, $2.09 for women

Example with Stata

Model 5 allows only slope to differ for women
- Intercept is the same for men and women
- Return to schooling is $2.24 for men, $1.87 for women
Model 6 allows interaction in effects of experience and education
- Return to experience is increasing in education
- Return to education is increasing in experience

Nonlinear Effects of Student Teacher Ratio on Test Scores

Introduction

We end the chapter by revisiting the effect of class size and test scores
Our models until now have been linear
- One extra student has same effect on test scores regardless of class size
This section generalizes that linear model in two ways
Includes interaction terms with other variables
- Lets class size effect depend on values of other variables
- Allows for nonlinearities in the class size effect
  - Lets class size effect depend on class size
Example also uses hypothesis testing techniques learned earlier

Introduction

Base Specification

Column 1 contains the base specification
Model is linear in student teacher ratio, with small set of controls
- % ESL
- % Free/Reduced Lunch
Slope on class size is what we have seen before
- One more student per teacher lowers test scores by one point
- Significant at the 1% level

Alternative Specification with Income Control

Column 2 adds log district income to the base specification
Slope on class size falls compared to base
- One more student per teacher lowers test scores by 0.73 points
- Significant at the 1% level
Important to be careful with interpretation of slope on income
- Measured in logarithms
- Effect of 1% increase in income on test scores is $\frac{\hat{\beta}_{inc} }{100} = \frac{11.57}{100} = 0.1157$
- Statistically significant at 1% level
- It is a relatively small coefficient

Interaction with High Fraction ESL

Columns 3-4 explore interaction terms with $HiEL_{i}$
- $HiEL_{i} = 1$ when fraction of English learners $>10\%$
Models include $HiEL_{i}$ on its own and interacted with $STR_{i}$
- Allows intercept shift for districts with high % ESL
- Also allows different slopes on $STR_{i}$ for high and low % ESL districts
Using column 3, interpretation on $STR_{i}$ slope in first row changes
- Adding one more student reduces scores by 0.97 for districts with low fraction ESL
  - This is slope on $STR_{i}$ for districts where $HiEL_{i} = 0$

Interaction with High Fraction ESL

Coefficient on $HiEL_{i} \times STR_{i}$ is difference in slope for high ESL districts
- Slope on $STR_{i}$ is -1.28 lower for $HiEL_{i} = 1$
- Total slope on $STR_{i}$ when $HiEL_{i} = 1$ is $-0.97 - 1.28 = -2.25$
- Effect of additional student is stronger when high % ESL
- But, coefficient on interaction is statistically insignificant
Coefficient on $HiEL_{i}$ measures intercept shift
- Difference in predicted test scores when other variables are zero
- $HiEL_{i} = 1$ has higher intercept by about 5.64 points
Column 4 is same idea, but adds free lunch and income
- Slope on $STR_{i}$ when $HiEL_{i} = 0$ is -0.53
- Slope on $STR_{i}$ when $HiEL_{i} = 1$ is $-0.53 - 0.58 = -1.11$
- Upward intercept shift when $HiEL_{i} = 1$ is about the same

Nonlinearities in Test Score Effect

Columns 5-7 explore nonlinearities in the class size effect
- With polynomial functions
Column 5 adds a cubic polynomial in $STR_{i}$
- Coefficients on all polynomial terms are statistically significant
  - So we would want to keep nonlinear terms in the regression
- Interpretation is tricky
  - With a cubic, effect of $STR_{i}$ on test scores is $\frac{\Delta \hat{TestScore}_{i}}{\Delta STR_{i}} = \beta_{str} + 2\beta_{str^2}STR_{i} + 3\beta_{str^3}STR_{i}^2$
  - Need to know $STR_{i}$ to compute this effect
  - With 20 students, slope is $\frac{\Delta \hat{TestScore}_{i}}{\Delta STR_{i}} = 64.33 + 2(-3.42)(20) + 3(0.059)(20)^2 = -1.67$
- Also note intecept shift downward for high ESL districts

Nonlinearities in Test Score Effect

Column 7 reports very similar specification
- Difference is $HiEL$ dummy replaced with %ESL
- Estimates on polynomial terms are very similar
Column 6 interacts nonlinear function in $STR_{i}$ with $HiEL$
- Allows nonlinear function in $STR_{i}$ to differ between high and low ESL districts
- Interaction terms are difference in polynomial terms when $HiEL_{i} = 1$
  - These are significant at 5% level, suggesting different relationship in high vs low ESL districts
  - Also, reject null that interaction terms are jointly zero at 5% level

Nonlinearities in Test Score Effect

Summary

Table 8.3 in text presents many specifications
Here are the main takeaways
- Omitted variables bias is an issue
  - Controlling for economic background is important
- There are nonlinearities in the effect of class size on test scores
  - All of the polynomial terms in regressions are statistically significant
  - Means effect of class size depends on size of class
- Relationship between class size and test scores is different in districts with high %ESL
  - In columns 3 and 4, accept null that difference in slope is zero
  - However, in column 6, differences in slope are significant when we allow nonlinearities