Linear Regression Model - Hypothesis Testing

EC295

Justin Smith

Wilfrid Laurier University

Fall 2022

Introduction

Introduction

  • We previously learned how to estimate regression parameters

  • In this section, we learn how to test claims about those parameters

  • Hypothesis testing follows the same steps we learned previously

    • Make a claim about the regression parameter (e.g. \(\beta_{1} = 0\))

    • Use estimate \(\hat{\beta}_{1}\) and its sampling distribution to evaluate probable truth of claim

    • If claim is unlikely to be true, reject it

  • We also explore assumptions about the regression error term

    • Affects standard error of \(\hat{\beta}_{1}\)

    • This in turn affects the test statistic in hypothesis tests

Testing Hypotheses About One Regression Coefficient

Testing Two-Sided Hypotheses about \(\beta_{1}\)

  • The population regression model was

\[Y_{i}= \beta_{0} + \beta_{1}X_{i} + u_{i}\]

  • Imagine that you are interested in \(\beta_{1}\)

  • This is an unknown parameter

    • A feature of the population

    • But we do not observe the population

  • We estimate \(\beta_{1}\) with the Ordinary Least Squares (OLS) estimator \(\hat{\beta}_{1}\)

    • Use this to test claims about \(\beta_{1}\)

Testing Two-Sided Hypotheses about \(\beta_{1}\)

  • Follows a set of steps

    1. Formulate opposing hypotheses about \(\beta_{1}\)

    2. Choose a test statistic

    3. Formulate a decision rule

    4. Use sample data and apply decision rule

  • Step 1: Opposing hypotheses in a two-sided test would be

    • \(H_{0}: \beta_{1} = \beta_{1,0}\)

    • \(H_{1}: \beta_{1} \neq \beta_{1,0}\)

      • \(\beta_{1,0}\) is the value of the claim

      • In regression, claims are about relationship between \(X\) and \(Y\)

      • e.g., if claim is that \(X\) and \(Y\) are unrelated, then \(\beta_{1,0} = 0\)

Testing Two-Sided Hypotheses about \(\beta_{1}\)

  • Step 2: choose a test statistic

    • t-statistic Measures distance of estimate away from claim \[t = \frac{\hat{\beta}_{1} -\beta_{1,0} }{SE(\hat{\beta}_{1})}\]

    • This is a random variable, because it varies across samples

    • t-statistic has a Standard Normal distribution in large samples

      • Result of the Central Limit Theorem
    • Main complication is computing \(SE(\hat{\beta}_{1})\)

    • Recall that the variance of \(\hat{\beta}_{1}\) is

\[\sigma^2_{\beta_{1}}=\frac{VAR\left( (X_{i} - \mu_{X})u_{i}\right) }{n(\sigma_{X}^2)^2}\]

Testing Two-Sided Hypotheses about \(\beta_{1}\)

  • Step 2 continued

    • \(\sigma^2_{\beta_{1}}\) depends on unknown population variances

      • \(VAR\left( (X_{i} - \mu_{X})u_{i}\right)\)

      • \(\sigma_{X}^2\)

    • Replace each with estimators

      \[\hat{VAR}\left( (X_{i} - \mu_{X})u_{i}\right) = \frac{1}{n-2} \sum_{i=1}^{n} (X_{i} - \bar{X})^2\hat{u}_{i}^2\] \[\hat{\sigma}_{X}^2 = \frac{1}{n} \sum_{i=1}^{n} (X_{i} - \bar{X})^2\]

Testing Two-Sided Hypotheses about \(\beta_{1}\)

  • Step 2 continued

    • With this, we get the the estimator of the variance of \(\sigma^2_{\beta_{1}}\) \[\hat{\sigma}^2_{\beta_{1}}=\frac{1}{n}\frac{\frac{1}{n-2} \sum_{i=1}^{n} (X_{i} - \bar{X})^2\hat{u}_{i}^2}{\left[ \frac{1}{n} \sum_{i=1}^{n} (X_{i} - \bar{X})^2 \right ]^2}\]

    • \(SE(\hat{\beta}_{1})\) is the square root of \(\hat{\sigma}^2_{\beta_{1}}\)

      \[SE(\hat{\beta}_{1}) = \sqrt{\hat{\sigma}^2_{\beta_{1}}}\]

    • Function is routinely produced in programs like Stata

  • Step 3: Form a decision rule

    • Typically, set \(\alpha = 0.05\), with corresponding critical value \(t^c = 1.96\)

Testing Two-Sided Hypotheses about \(\beta_{1}\)

  • Step 4: Compute t-statistic and apply decision rule

    • If \(\hat{\beta}_{1}\) is too far from \(\beta_{1,0}\) given decision rule, reject \(H_{0}\)
  • Alternatively, use p-value approach

    • \(p\)-value is likelihood of getting \(\hat{\beta}_{1}\) further away from \(\beta_{1,0}\) than observed value

    • for a two-tailed test

      \[p-value = 2 \times Pr[|t| > |t^{act}|] = 2 \times Pr \left[ \left | \frac{\hat{\beta}_{1} -\beta_{1,0} }{SE(\hat{\beta}_{1})} \right | > \left | \frac{\hat{\beta}_{1}^{act} -\beta_{1,0} }{SE(\hat{\beta}_{1})} \right | \right ]\]

    • Reject for any \(\alpha >\) \(p\)-value

  • As we have mentioned, the most popular two-sided test involves \(H_{0}:\beta_{1,0} = 0\)

    • The p-value for this test is automatically reported in Stata regression output

    • Standard error also reported for testing other hypotheses

Testing One-Sided Hypotheses about \(\beta_{1}\)

  • One sided hypotheses involve inequality constraints

    • For upper-tailed tests, \(H_{0}: \beta_{1} \le \beta_{1,0}, H_{1}: \beta_{1} > \beta_{1,0}\)

    • For lower-tailed tests, \(H_{0}: \beta_{1} \ge \beta_{1,0}, H_{1}: \beta_{1} < \beta_{1,0}\)

  • t-statistic is computed in the same way

\[t = \frac{\hat{\beta}_{1} -\beta_{1,0} }{SE(\hat{\beta}_{1})}\]

  • Interpretation of t-statistic is different

    • For upper-tailed tests, reject only if t-statistic is large positive

    • For lower-tailed tests, reject only if t-statistic is large negative

    • As such, critical values are computed only in one tail

Testing One-Sided Hypotheses about \(\beta_{1}\)

  • \(p\)-value formula slightly different

    • For upper-tailed tests, \(p\)-value = \(Pr[t > t^{act}]\)

    • For lower-tailed tests, \(p\)-value = \(Pr[t < t^{act}]\)

  • One-sided hypothesis tests are rare

    • Use only when there is an obvious reason

    • In regression, generally we are interested in testing “significance” of a variable

      • Whether it has an effect at all, positive or negative
  • Note we can also test claims about \(\beta_{0}\)

    • Follows same procedure, with different standard error

    • We will not cover this in lecture

Errors in Hypothesis Testing

  • Two kinds of mistakes in hypothesis testing

    Error Table for Hypothesis Testing
    \(H_{0}\) True \(H_{0}\) False
    Accept \(H_{0}\) Correct Type II Error
    Reject \(H_{0}\) Type I Error Correct
  • Type I Error: Rejecting a true null hypothesis

    • Recall: \(t\) assumes \(H_{0}\) is true

    • Reject claims that are unlikely under sampling randomness

    • Though unlikely, these values are possible when \(H_{0}\) is true

    • We could mistakenly reject \(H_{0}\) when it is true if we get an odd sample

Errors in Hypothesis Testing

  • Likelihood of Type I error equals to \(\alpha\)

    • \(\alpha\) defines improbable values when \(H_{0}\) is true

    • Though unlikely, those values actually occur \(\alpha \%\) of the time when \(H_{0}\) is true

    • So, probability of Type I error is \(\alpha\)

  • Why not set \(\alpha\) really low to avoid Type I Error?

    • Makes it more difficult to reject any hypothesis

    • Increases likelihood of Type II Error

  • Type II Error: Accepting a false null hypothesis

    • Depends on true value of parameter

    • So we do not actually know likelihood of Type II Error

Errors in Hypothesis Testing

  • But, we do know that

    • As \(\alpha \downarrow\), Pr[Type II Error] \(\uparrow\)

      • Higher \(\alpha\) means accepting more null hypotheses

      • Including false ones

    • As \(\beta_{1}\) gets closer to \(\beta_{1,0}\), Pr[Type II Error] \(\uparrow\)

      • Hard to tell \(H_{0}\) from truth when they are very close

      • Many values will fall in acceptance region, even if \(H_{0}\) is false

  • Power: Probability of correctly rejecting \(H_{0}\)

    • Equals \(1 -\) Pr[Type II Error]

P-values

  • Another way to do hypothesis testing avoids pre-defining a significance level

    • Instead find fraction of values more extreme than the estimate

    • Find all significance levels consistent with rejection/acceptance

  • P-value:The probability of drawing a test statistic at least as extreme as the one computed from the sample

    • Small p-value means few values are more extreme

      • Sample estimate is in the tails of the distribution

      • Unlikely caused by random sampling, so \(H_{0}\) may not be true

    • Large p-value means many values could be more extreme

      • Sample estimate is closer to the middle of the distribution

      • Possible from random sampling , so \(H_{0}\) may be true

P-values

  • More formally, p-value is defined as below

    • Lower-tailed test: p-value = \(Pr[t < t^{act}]\)

    • Upper-tailed test: p-value = \(Pr[t > t^{act}]\)

    • Two-tailed test: p-value = \(2*Pr[t > |t^{act}|]\)

  • P-values define significance levels consistent with acceptance/rejection

    • All \(\alpha >\) p-value lead to rejecting \(H_{0}\)

    • All \(\alpha <\) p-value lead to accepting \(H_{0}\)

Confidence Intervals for Regression

Confidence Interval for \(\beta_{1}\)

  • We discussed previously that point estimates lack information about sampling uncertainty

  • Confidence intervals directly incorporate that information into the estimator

  • We can construct confidence intervals for \(\beta_{1}\)

  • Remember that confidence intervals are constructed by adding and subtracting a “margin of error” from the point estimate

\[\mbox{CI } = \mbox{Point Estimate } \pm \mbox{ Margin of Error}\]

  • In the case of regression

    • The point estimate \(= \hat{\beta}_{1}\)

    • The margin of error \(= t^c \times SE(\hat{\beta}_{1})\)

Confidence Interval for \(\beta_{1}\)

  • So a confidence interval for would \(\beta_{1}\) be \[\hat{\beta}_{1} \pm t^c \times SE(\hat{\beta}_{1})\]

  • The interval depends on three things

    • \(\hat{\beta}_{1}\), which we know from our regression

    • \(SE(\hat{\beta}_{1})\), which we also know

    • A critical value \(t^c\)

      • Same critical value from a two-sided hypothesis test at significance level \(\alpha\)

      • \(t^c\) increases when \((1-\alpha)\%\) increases

      • We call \((1-\alpha)\%\) the confidence level

  • The width of the interval therefore depends on \(t^c\) and \(se(\hat{\beta}_{1})\)

    • Larger standard error makes the interval wider

    • A higher confidence level, which increases \(t^c\), makes the interval wider

Confidence Interval for \(\beta_{1}\)

  • Recall direct relationship between confidence intervals and hypothesis testing

    • Suppose confidence level is \((1-\alpha) \%\)

    • Any \(H_{0}\) in the interval is not rejected at \(\alpha \%\) level

    • Any \(H_{0}\) outside the interval is rejected at \(\alpha \%\) level

  • Intuition for 95% confidence interval

    • In hypothesis test, accept \(H_{0}\) if \(\hat{\beta}_{1}\) is less than 1.96 standard deviations away from claim

    • With confidence interval, we find values 1.96 standard deviations away from \(\hat{\beta}_{1}\)

    • Then, if we set \(H_{0}\) in that range, then \(\hat{\beta}_{1}\) is less than 1.96 standard deviations away

    • If we set \(H_{0}\) outside the range, is more than 1.96 standard deviations away

Confidence Interval for \(\beta_{1}\)

  • Can compute confidence interval for any \(\alpha\)

    • If \(\alpha = 0.01\), this is a 99% confidence interval

    • If \(\alpha = 0.10\), this is a 90% confidence interval

    • \((1-\alpha) \%\) confidence interval is related to hypothesis test at \(\alpha \%\) level

  • Recall that \(\beta_{1}\) is effect of one-unit change in \(X\) on \(Y\)

  • Can construct more confidence interval for general in \(X\)

  • If \(\beta_{1}\) is effect of one-unit change, \(\Delta x \beta_{1}\) of a change \(\Delta x\)

    • e.g., Effect of 2-unit change in \(X\) is \(2\beta_{1}\)

Confidence Interval for \(\beta_{1}\)

  • A confidence interval for \(\Delta x \beta_{1}\) is \[\Delta x\hat{\beta}_{1} \pm t^c \times SE(\hat{\beta}_{1}) \Delta x\]

  • or equivalently

    \[\{ (\hat{\beta}_{1} - t^c \times SE(\hat{\beta}_{1})) \Delta x, (\hat{\beta}_{1} + t^c \times SE(\hat{\beta}_{1})) \Delta x \}\]

  • Effectively, the interval is scaled by \(\Delta x\)

  • Notice that if you set \(\Delta x = 1\), you get the original formula

Example with Stata

  • Recall the research question: Are class size and student achievement related?

  • The underlying population regression function is

\[TestScore_{i}= \beta_{0} + \beta_{1}STR_{i} + u_{i}\]

  • \(\beta_{1}\) is ceteris paribus effect of one more student per teacher

  • We estimated \(\beta_{1}\) and \(\beta_{0}\) using OLS on simulated data

Example with Stata

  • Command for OLS estimates is “regress”

  • The output from that command is summarized below

clear
set obs 420
set seed 12345

gen str = rnormal(20,2)
gen u = rnormal(0,20)

gen testscr = 700 -2 * str + u
    
regress testscr str
Number of observations (_N) was 0, now 420.

      Source |       SS           df       MS      Number of obs   =       420
-------------+----------------------------------   F(1, 418)       =     15.45
       Model |  6383.10498         1  6383.10498   Prob > F        =    0.0001
    Residual |  172661.265       418  413.065226   R-squared       =    0.0357
-------------+----------------------------------   Adj R-squared   =    0.0333
       Total |  179044.369       419  427.313531   Root MSE        =    20.324

------------------------------------------------------------------------------
     testscr | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         str |  -1.855817    .472094    -3.93   0.000    -2.783791   -.9278429
       _cons |   696.4934    9.55519    72.89   0.000     677.7112    715.2756
------------------------------------------------------------------------------

Example with Stata

  • Important results for hypothesis testing is in bottom of table

  • The “std. err.” column reports the standard errors for the estimates

    • \(SE(\hat{\beta}_{1}) = .472\)
  • The “t” column shows the t-statistic for \(H_{0}: \beta_{1,0} =0\)

    • \(t^{act} = -3.93\)

    • Observed \(\hat{\beta}_{1}\) is 3.93 standard deviations below zero

  • The “P \(> |t|\)” column shows the 2-sided p-value for \(H_{0}: \beta_{1,0} =0\)

    • p-value is 0.000

    • 0.000% of values are more extreme than the one we observe

    • We reject \(H_{0}\) at most significance levels

Example with Stata

  • The Stata output also reports a 95% confidence interval

    • Interval is \(\{ -2.78, -0.93\}\)

    • Any \(H_{0}\) between these numbers is accepted at 5% level

    • All others are rejected

  • Suppose you want to test other hypotheses

    • e.g., what if we want to test \(H_{0}: \beta_{1,0} =2\)?
  • For this, we need to use additional commands

  • The easiest way to do two-sided tests is the “test” command

    • NOT the “ttest” command, which tests hypotheses about population means

Example with Stata

  • Now test whether the coefficient on \(str\) equals 2

  • Notice it reports F(1,418)

    • This is the F-statistic that we will learn later

    • Conclusion based on this test is the same as a t-test

    • In fact, \(F = t^2\)

    • So, can do test with p-value

test str = -2
 ( 1)  str = -2

       F(  1,   418) =    0.09
            Prob > F =    0.7602
  • You need to use the test command immediately after the regression

Regression when \(X\) is a Binary Variable

Introduction

  • Up to now we have examined only quantitative variables

    • Test scores

    • Income

    • Schooling

  • In many applications, we are interested in qualitative factors

    • Gender

    • Race

    • Location

  • In this section we discuss how to incorporate qualitative information into a regression

Categorical Variables

  • Qualitative factors are typically categorical in nature

    • Gender: {male, female}

    • Marital Status: {married, single}

    • City: {Toronto, Montreal, Waterloo, ...}

  • These variables separate data into distinct groups

  • In many cases, the values the variable can take are not numeric

  • For variables with 2 categories, we often code them numerically as dummy variables

    • Dummy Variable: A binary, 0-1 variable that describes the values of a qualitative variable with two categories

Dummy Variables

  • Examples

    • Code gender into a dummy variable called female

      • Gender: {male, female} \(\rightarrow\) female = {0,1}

      • The variable female = 0 if male, and 1 if female

    • Code marital status into a dummy variable called married

      • Gender: {married, single} \(\rightarrow\) married = {0,1}

      • The variable married = 0 if single, and 1 if married

    • In both cases, we could reverse the coding

      • Could instead define male = 0 if female, and 1 if male

      • and single = 0 if married, and 1 if single

    • Key is the variable name usually indicates event with value 1

Dummy Variables

  • Point of using 0,1 is that it leads to useful interpretations in data analysis and regression models

    • Sample mean of 0,1 variable is fraction of values that equal 1

      • If female = {0,1}, \(\frac{1}{N}\sum_{i=1}^{N} female_{i}\) = fraction female

      • If married = {0,1}, \(\frac{1}{N}\sum_{i=1}^{N} married_{i}\) = fraction married

    • In regression models, parameters on dummy variables also have useful interpretations

      • We will learn these details later in this section
  • We will focus only on variables with two categories

    • Later in the course we may discuss variables with more than two categories

Interpretation of Regression Coefficients

  • The mechanics of OLS are the same with binary \(X\) variables

    • Still minimize the sum of squared residuals
  • It is only the interpretation of \(\beta_{1}\) and \(\beta_{0}\) that change

  • Imagine we want to measure the effect of class size on test scores

  • We only have access to a binary variable on class size

    \[D_{i} = 1\{str_{i} \ge 20 \}\]

    • \(D_{i}\) equals 0 if student teacher ratio is \(<20\)

    • \(D_{i}\) equals 1 if student teacher ratio is \(\ge 20\)

Interpretation of Regression Coefficients

  • The regression model in this context is \[Y_{i} = \beta_{0} + \beta_{1}D_{i} + u_{i}\]

  • The population regression function is \[E[Y_{i}|D_{i}] = \beta_{0} + \beta_{1}D_{i}\]

  • How do we interpret \(\beta_{1}\) in this regression?

  • To see this, compute the conditional expectation for each value of \(D_{i}\) \[E[Y_{i}|D_{i}=1] = \beta_{0} + \beta_{1}\] \[E[Y_{i}|D_{i}=0] = \beta_{0}\]

  • Then take the difference between the two \[E[Y_{i}|D_{i}=1] - E[Y_{i}|D_{i}=0] = \beta_{1}\]

Interpretation of Regression Coefficients

  • From this, \(\beta_{1}\) is the difference in the average value of \(Y_{i}\) between the two groups

    • Average value of \(Y_{i}\) for large classes minus average value of \(Y_{i}\) for small classes
  • The same is true for any dummy variable

    • \(\beta_{1}\) is average value of \(Y_{i}\) when dummy variable equals 1 minus average value of \(Y_{i}\) when dummy variable equals zero
  • Thus, the interpretation is not a “slope”

    • Instead it is a difference in means
  • Also notice interpretation of \(\beta_{0}\)

    • Average value of \(Y_{i}\) when dummy variable equals 0

Interpretation of Regression Coefficients

  • The above interpretation is for the regression parameter \(\beta_{1}\)

  • The OLS estimator \(\hat{\beta}_{1}\) has an analogous interpretation

    • Sample average of \(Y_{i}\) when dummy variable equals 1, minus sample average of \(Y_{i}\) when dummy variable equals zero
  • \(\hat{\beta}_{0}\) is the sample mean when the dummy variable equals 0

  • You can test hypotheses about \(\beta_{1}\) using regular t tests

    • In this context, you are testing difference in sample means between groups
  • Confidence intervals are also constructed in the same way

Example with Stata

  • In example below, we create a dummy variable for class size

  • Then compare regression estimate to difference in means

gen d = 1 if str >=20
replace d = 0 if str <20
    
regress testscr d
(206 missing values generated)

(206 real changes made)

      Source |       SS           df       MS      Number of obs   =       420
-------------+----------------------------------   F(1, 418)       =      3.56
       Model |  1510.61982         1  1510.61982   Prob > F        =    0.0600
    Residual |   177533.75       418  424.721889   R-squared       =    0.0084
-------------+----------------------------------   Adj R-squared   =    0.0061
       Total |  179044.369       419  427.313531   Root MSE        =    20.609

------------------------------------------------------------------------------
     testscr | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
           d |  -3.793689   2.011576    -1.89   0.060    -7.747755    .1603765
       _cons |   661.0675   1.435882   460.39   0.000      658.245    663.8899
------------------------------------------------------------------------------

Example with Stata

  • Compare regression coefficients to differences in means
sum testscr if d == 1
sum testscr if d == 0
    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
     testscr |        214    657.2738    20.45359   597.7711   709.1503

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
     testscr |        206    661.0675     20.7688    593.118   713.0748

Heteroskedasticity vs Homoskedasticity

Definitions

  • For OLS estimators, we made assumptions about the average value of \(u_{i}\) conditional on \(X_{i}\)

    • More specifically, we assumed \(E[u_{i}|X_{i}] = 0\)
  • For hypothesis testing, we also need to make assumptions about the variance of \(u_{i}\) at each \(X_{i}\)

  • Homoskedasticity: the variance of \(u_{i}\) conditional on \(X_{i}\) is constant

    • Mathematically, \(VAR[u_{i}|X_{i}] = \sigma_{u}^2\)
  • Heteroskedasticity: the variance of \(u_{i}\) conditional on \(X_{i}\) varies across observations

    • Mathematically, \(VAR[u_{i}|X_{i}] = \sigma_{ui}^2\)

    • Difference is that \(\sigma_{ui}^2\) varies across individuals

Graphical Representation

Intuition

  • Recall that errors \(u_{i}\) are difference between average \(E[Y_{i}|X_{i}]\) and actual \(Y_{i}\)

  • Therefore

    • Homoskedasticity means \(Y_{i}\) values have same spread around mean at each \(X_{i}\)

    • Heteroskedasticity means \(Y_{i}\) values may be spread differently around mean at each \(X_{i}\)

  • How the errors are spread out has several important implications that we discuss below

Implications

  • Why does the spread of the errors at each \(X_{i}\) matter?

  • There are two key implications

  1. Under homoskedasticity, OLS estimators are Best Linear Unbiased Estimators (BLUE)

    • Compared to all linear unbiased estimators, they have the lowest variance

    • Under heteroskedasticity, this is not true

  2. Variance of OLS estimators is different

    • Under heteroskedasticity, estimated variance of \(\hat{\beta}_{1}\) is the formula derived earlier

\[\hat{\sigma}^2_{\beta_{1}}=\frac{1}{n}\frac{\frac{1}{n-2} \sum_{i=1}^{n} (X_{i} - \bar{X})^2\hat{u}_{i}^2}{\left[ \frac{1}{n} \sum_{i=1}^{n} (X_{i} - \bar{X})^2 \right ]^2}\]

Implications

  1. Variance of OLS estimators continued...

    • Under homoskedasticity, estimated variance of \(\hat{\beta}_{1}\) simplifies to \[\tilde{\sigma}^2_{\beta_{1}}=\frac{\frac{1}{n-2} \sum_{i=1}^{n} \hat{u}_{i}^2}{ \sum_{i=1}^{n} (X_{i} - \bar{X})^2}\]

    • If you think errors are heteroskedastic you must use the first formula

      • Use of homoskedastic formula will lead to incorrect hypothesis testing

      • Because you will underestimate actual standard error estimator

      • Overestimates the size of the extreme part of distribution of t-statistic

      • The critical values will be too low

      • Leads to over-rejecting \(H_{0}\)

      • Confidence intervals based on incorrect standard error also wrong

Implications

  1. OLS estimators are still unbiased with Normal distribution

    • The distribution of the errors has no impact on expected value of \(\hat{\beta}_{1}\)

      • \(\hat{\beta}_{1}\) is unbiased under homoskedasticity or heteroskedasticity

      • As long as other three assumptions hold true

    • Distribution of \(\hat{\beta}_{1}\) also remains Normal in large samples

    • For this reason, heteroskedasticity is only a problem for inference

What to do in Practice?

  • How do we know if errors are heteroskedastic or not?

  • Answer is most of the time we do not

  • For this reason, best to use heteroskedasticity standard errors as default

    • Only use homoskedasticity ones in special cases
  • Note: as default, Stata produces the homoskedasticity standard errors for regressions

  • To get the ones consistent with heteroskedasticity, you must use the robust option in regress

Example with Stata

  • We will generate new data imposing heteroskedasticity on the error

    • Code below makes \(u\) increasing with \(str^4\)

    • Higher values of \(str\) will have more spread in \(testscr\)

    • This is one type of heteroskedasticity

clear
set obs 420
set seed 12345
gen str = rnormal(20,2)
gen u = rnormal(0,.0005*str^4)

gen testscr = 700 -2 * str + u

Example with Stata

twoway scatter testscr str, title("Data with Heteroskedastic Errors") scheme(s1mono)

Example with Stata

  • Code and results below show OLS with assumption of homoskedastic errors
regress testscr str
      Source |       SS           df       MS      Number of obs   =       420
-------------+----------------------------------   F(1, 418)       =      4.32
       Model |  41914.6004         1  41914.6004   Prob > F        =    0.0383
    Residual |  4056457.49       418  9704.44375   R-squared       =    0.0102
-------------+----------------------------------   Adj R-squared   =    0.0079
       Total |  4098372.09       419  9781.31764   Root MSE        =    98.511

------------------------------------------------------------------------------
     testscr | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         str |  -4.755562   2.288255    -2.08   0.038    -9.253484   -.2576402
       _cons |   752.5441   46.31433    16.25   0.000     661.5061    843.5821
------------------------------------------------------------------------------

Example with Stata

  • Code and results below show OLS with assumption of heteroskedastic errors

  • The standard error has become larger

regress testscr str, robust
Linear regression                               Number of obs     =        420
                                                F(1, 418)         =       1.87
                                                Prob > F          =     0.1725
                                                R-squared         =     0.0102
                                                Root MSE          =     98.511

------------------------------------------------------------------------------
             |               Robust
     testscr | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         str |  -4.755562   3.480204    -1.37   0.173    -11.59644     2.08532
       _cons |   752.5441   67.19808    11.20   0.000     620.4559    884.6324
------------------------------------------------------------------------------

Example with Stata

  • Below we simulate a t-test with heteroskedastic and homoskedastic errors

  • Show what happens to t-distribution when null hypothesis is true

    • We use \(\beta_{1} = 0\) as the null hypothesis

Example with Stata

clear
set more off
set obs 999
set seed 12345

gen t_he = .
gen t_ho = .

foreach iter of numlist 1/999 {

preserve
clear
qui set obs 420

gen str = rnormal(20,2)
gen u = rnormal(0,.0005*str^4)

gen testscr = 700  + u

qui regress testscr str 
    local t_ho = _b[str]/_se[str] 
    
qui regress testscr str, robust 
    local t_he = _b[str]/_se[str]   

restore

replace t_ho = `t_ho' in `iter'
replace t_he = `t_he' in `iter'

}

Example with Stata

twoway (kdensity t_ho) (kdensity t_he) (function tden(420-2,x), range(-5 5)), xtitle(t) title(t-distribution when Null Hypothesis is True) legend(order(1 "Homoskedastic Errors" 2 "Heteroskedastic Errors" 3 "Actual t-dist")) scheme(s1color)

Example with Stata

  • Below is the fraction of t values we reject using the critical values from t-distribution

  • We chose critical value so that \(\alpha = 0.05\)

    • So we should reject 5% of the time
  • With homoskedasticity standard errors we reject 10% of the time

    • We “over-reject” when the null hypothesis is true
  • With robust standard errors, it is closer to 5%

  • The robust errors generate better hypothesis testing

    • You should use them all the time
gen reject_ho = abs(t_ho) >=invttail(420-2,0.025)
gen reject_he = abs(t_he) >=invttail(420-2,0.025)

tab1 reject*
-> tabulation of reject_ho  

  reject_ho |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        899       89.99       89.99
          1 |        100       10.01      100.00
------------+-----------------------------------
      Total |        999      100.00

-> tabulation of reject_he  

  reject_he |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        951       95.20       95.20
          1 |         48        4.80      100.00
------------+-----------------------------------
      Total |        999      100.00

The Gauss-Markov Theorem

  • OLS estimators are the most widely-used to estimate regression paramters

  • The reason is based on the Gauss-Markov theorem

  • Gauss-Markov Theorem: Under the Gauss-Markov conditions, OLS is the Best Linear Unbiased Estimator for \(\beta_{1}\)

    • The Gauss-Markov conditions are the assumptions we discussed before

      • Zero conditional mean of the errors

      • \(X_{i}, Y_{i}\) are iid

      • Large outliers are unlikely

      • Homoskedastic errors

    • If all these hold, OLS estimators have smallest variance among all linear unbiased estimators

    • This means, they have the lowest amount of sampling variation