Matrix Review

EC655

Justin Smith

Wilfrid Laurier University

Fall 2022

Introduction

Undergrad metrics normally uses scalar notation
- More accessible for students without advanced math background
At the graduate level, it is often taught using matrix algebra
Some advantages to matrix notation
- More compact
- Easier to express some estimators
In this section, we review matrix algebra essentials for econometrics
- Not a comprehensive review
We will switch between scalar and matrix notation in the course
- Depending on which is clearer in each context

Matrices and Vectors

Matrix

A matrix is a rectangular array of numbers organized in rows and columns
For example, matrix \(\mathbf{A}\) with 2 rows and 3 columns could be

\[\mathbf{A} = \begin{bmatrix} 1 & 2 & 3 \\ 4 &5 & 6 \end{bmatrix}\]

More generally, matrix \(\mathbf{A}\) with m rows and n columns is

\[\mathbf{A}= \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1n} \\ a_{21}& a_{22} &\cdots & a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1}& a_{m2} &\cdots & a_{mn} \end{bmatrix}\]

Vectors

A vector is a matrix with one column or one row
A row vector \(\mathbf{a}\) with n elements is

\[\mathbf{a}= \begin{bmatrix} a_{1}& a_{2} &\cdots & a_{n} \end{bmatrix}\]

A .red[column vector] \(\mathbf{a}\) with m elements is

\[\mathbf{a}= \begin{bmatrix} a_{1}\\ a_{2}\\ \vdots \\ a_{m} \end{bmatrix}\]

Special Matrices

A Square Matrix has the same number of rows and columns

\[\mathbf{A}= \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1m} \\ a_{21}& a_{22} &\cdots & a_{2m} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1}& a_{m2} &\cdots & a_{mm} \end{bmatrix}\]

A Diagonal Matrix is a square matrix with zeroes for all off-diagonal elements

\[\mathbf{A}= \begin{bmatrix} a_{11}& 0&\cdots & 0 \\ 0& a_{22} &\cdots & 0 \\ \vdots & \vdots &\ddots & \vdots \\ 0& 0&\cdots & a_{mm} \end{bmatrix}\]

Special Matrices

The Identity Matrix is a square matrix with ones on the diagonal and zeroes on the off-diagonals

\[\mathbf{I}= \begin{bmatrix} 1& 0&\cdots & 0 \\ 0& 1 &\cdots & 0 \\ \vdots & \vdots &\ddots & \vdots \\ 0& 0&\cdots & 1 \end{bmatrix}\]

The Zero Matrix is a matrix with zeroes for all elements

\[\mathbf{0}= \begin{bmatrix} 0& 0&\cdots & 0 \\ 0& 0 &\cdots & 0 \\ \vdots & \vdots &\ddots & \vdots \\ 0& 0&\cdots & 0 \end{bmatrix}\]

Matrix Addition

You can add and subtract matrices with the same dimensions
- Matrices with different dimensions are not conformable for addition or subtraction
The sum of matrices \(\mathbf{A}\) and \(\mathbf{B}\) with dimension \(m \times n\) is

\[\mathbf{A} + \mathbf{B}= \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1n} \\ a_{21}& a_{22} &\cdots & a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1}& a_{m2} &\cdots & a_{mn} \end{bmatrix} + \begin{bmatrix} b_{11}& b_{12} &\cdots & b_{1n} \\ b_{21}& b_{22} &\cdots & b_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ b_{m1}& b_{m2} &\cdots & b_{mn} \end{bmatrix}\]

\[= \begin{bmatrix} a_{11} + b_{11}& a_{12} + b_{12} &\cdots & a_{1n}+ b_{1n} \\ a_{21} + b_{21}& a_{22} + b_{22} &\cdots & a_{2n}+ b_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1} + b_{m1}& a_{m2} +b_{m2} &\cdots & a_{mn}+ b_{mn} \\ \end{bmatrix}\]

Matrix Subtraction

Similarly, the difference between matrices \(\mathbf{A}\) and \(\mathbf{B}\) is

\[\mathbf{A} - \mathbf{B}= \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1n} \\ a_{21}& a_{22} &\cdots & a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1}& a_{m2} &\cdots & a_{mn} \end{bmatrix} - \begin{bmatrix} b_{11}& b_{12} &\cdots & b_{1n} \\ b_{21}& b_{22} &\cdots & b_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ b_{m1}& b_{m2} &\cdots & b_{mn} \end{bmatrix}\]

\[= \begin{bmatrix} a_{11} - b_{11}& a_{12} - b_{12} &\cdots & a_{1n}- b_{1n} \\ a_{21} - b_{21}& a_{22} - b_{22} &\cdots & a_{2n}- b_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1} - b_{m1}& a_{m2} -b_{m2} &\cdots & a_{mn}- b_{mn} \\ \end{bmatrix}\]

Rules for Addition and Subtraction

The following rules apply to matrix addition and subtraction
- Commutativity \[\mathbf{A + B = B + A}\]
- Associativity \[\mathbf{A + (B + C) = (A+B) + C}\]
Effectively, both rules mean order does not matter
- Similar to scalar math
For subtraction, replace plus sign with minus sign and same rules apply

Matrix Multiplication

To multiply matrix \(\mathbf{A}\) and \(\mathbf{B}\), the number of columns in \(\mathbf{A}\) must equal the number of rows in \(\mathbf{B}\)
Suppose matrix \(\mathbf{A}\) is \(m \times n\) and matrix \(\mathbf{B}\) is \(n \times p\)
Define product as \(\mathbf{C}\)= \(\mathbf{AB}\)
- The \(ij\) element of \(\mathbf{C}\) is the sum of the product of the corresponding elements along the \(i\)th row of \(\mathbf{A}\) and \(j\)th column of \(\mathbf{B}\)
  \[c_{ij} = \sum_{k} a_{ik}b_{kj}\]
- The product matrix \(\mathbf{C}\) will have dimension \(m \times p\)
  - The number of rows of \(\textbf{A}\) and number of columns of \(\textbf{B}\)

Matrix Multiplication

The product \(\mathbf{AB}\) is

\[\mathbf{AB}= \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1n} \\ a_{21}& a_{22} &\cdots & a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1}& a_{m2} &\cdots & a_{mn} \end{bmatrix} \times \begin{bmatrix} b_{11}& b_{12} &\cdots & b_{1p} \\ b_{21}& b_{22} &\cdots & b_{2p} \\ \vdots & \vdots &\ddots & \vdots \\ b_{n1}& b_{n2} &\cdots & b_{np} \end{bmatrix}\]

\[= \begin{bmatrix} a_{11} b_{11} + a_{12} b_{21} + \cdots + a_{1n} b_{n1} &a_{11} b_{12} + a_{12} b_{22} + \cdots + a_{1n} b_{n2} &\cdots&a_{11} b_{1p} + a_{12} b_{2p} + \cdots + a_{1n} b_{np}\\ a_{21} b_{11} + a_{22} b_{21} + \cdots + a_{2n} b_{n1} &a_{21} b_{12} + a_{22} b_{22} + \cdots + a_{2n} b_{n2} &\cdots&a_{21} b_{1p} + a_{22} b_{2p} + \cdots + a_{2n} b_{np}\\ \vdots &\ddots & \vdots \\ a_{m1} b_{11} + a_{m2} b_{21} + \cdots + a_{mn} b_{n1} &a_{m1} b_{12} + a_{m2} b_{22} + \cdots + a_{mn} b_{n2} &\cdots&a_{m1} b_{1p} + a_{m2} b_{2p} + \cdots + a_{mn} b_{np}\\ \end{bmatrix}\]

Matrix Multiplication

As an illustration suppose we have the following matrices \[\mathbf{A}= \begin{bmatrix} 1& 2\\ 3& 4 \\ \end{bmatrix} \mathbf{B}= \begin{bmatrix} 5&6&7 \\ 8&9 &10 \end{bmatrix}\]
We can multiply \(\mathbf{AB}\) because \(\mathbf{A}\) has 2 columns, and \(\mathbf{B}\) has 2 rows
The product \(\mathbf{C}\) = \(\mathbf{AB}\) is

\[\mathbf{C}= \begin{bmatrix} 1& 2\\ 3& 4 \\ \end{bmatrix} \times \begin{bmatrix} 5&6&7 \\ 8&9 &10 \end{bmatrix} = \begin{bmatrix} 1 \times 5 + 2\times 8&1 \times 6 + 2 \times 9 & 1 \times 7 + 2 \times 10 \\ 3 \times 5 + 4\times 8&3 \times 6 + 4 \times 9 & 3 \times 7 + 4 \times 10 \end{bmatrix}\]

\[= \begin{bmatrix} 21& 24& 27 \\ 47&54& 61 \end{bmatrix}\]

Scalar Multiplication

A scalar is a single real number
You can also multiply a scalar by a matrix
If \(\gamma\) is a scalar, and \(\mathbf{A}\) is a matrix, then

\[\mathbf{\gamma A}= \gamma \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1n} \\ a_{21}& a_{22} &\cdots & a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1}& a_{m2} &\cdots & a_{mn} \end{bmatrix} = \begin{bmatrix} \gamma a_{11}&\gamma a_{12} &\cdots & \gamma a_{1n} \\ \gamma a_{21}& \gamma a_{22} &\cdots & \gamma a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ \gamma a_{m1}& \gamma a_{m2} &\cdots & \gamma a_{mn} \end{bmatrix}\]

You multiply the scalar by each element of the matrix

Transpose

The transpose of a matrix is one where the rows and columns are switched
Suppose matrix \(\mathbf{A}\) is

\[\mathbf{A}= \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1n} \\ a_{21}& a_{22} &\cdots & a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1}& a_{m2} &\cdots & a_{mn} \end{bmatrix}\]

Then its transpose \(\mathbf{A'}\) is

\[\mathbf{A'}= \begin{bmatrix} a_{11}& a_{21} &\cdots & a_{m1} \\ a_{12}& a_{22} &\cdots & a_{m2} \\ \vdots & \vdots &\ddots & \vdots \\ a_{1n}& a_{2n} &\cdots & a_{mn} \end{bmatrix}\]

Transpose

The transpose has the following properties

\[\mathbf{(A')' = A }\] \[\mathbf{(\alpha A)' = \alpha A' }\] \[\mathbf{(A + B)' = A' + B' }\] \[\mathbf{(AB)' = B'A' }\]

There are additional rules for different types of matrices that we will cover below

Partitioned Matrix Multiplication

You may sometimes want to break matrices into vectors before you multiply
Multiplication works the same way, but notation can be cleaner and more intuitive
Suppose we have the following matrices \[\mathbf{A}= \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1n} \\ a_{21}& a_{22} &\cdots & a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1}& a_{m2} &\cdots & a_{mn} \end{bmatrix} \mathbf{B}= \begin{bmatrix} b_{11}& b_{12} &\cdots & b_{1p} \\ b_{21}& b_{22} &\cdots & b_{2p} \\ \vdots & \vdots &\ddots & \vdots \\ b_{n1}& b_{n2} &\cdots & b_{np} \end{bmatrix}\]
We are interested in the product \(\mathbf{AB}\)

Partitioned Matrix Multiplication

Break these matrices into vectors conformable for multiplication

\[\mathbf{A}= \begin{bmatrix} \mathbf{a_{1}}&\mathbf{a_{2}} & \cdots & \mathbf{a_{n}} \end{bmatrix} \mathbf{B}= \begin{bmatrix} \mathbf{b_{1}}\\ \mathbf{b_{2} }\\ \vdots \\ \mathbf{b_{n}} \end{bmatrix}\]

Where

\[\mathbf{a_{1}}= \begin{bmatrix} a_{11}\\ a_{21}\\ \cdots\\ a_{m1} \end{bmatrix} \mathbf{b_{1}}= \begin{bmatrix} b_{11}&b_{12} & \cdots & b_{1p} \end{bmatrix}\]

Partitioned Matrix Multiplication

Multiply the vectors to get

\[\mathbf{AB} = \sum_{i=1}^{n} \mathbf{a_{i}b_{i}}\]

This breaks the product \(\mathbf{AB}\) into the sum of \(n\) sub-matrices
- Each sub-matrix is product of corresponding vectors
- Also each sub-matrix will have dimension \(m \times p\)
This will be useful for some econometric estimators we derive
- Makes notation simpler and more intuitive
Again, note that you get the same answer as doing straight matrix multiplication

Rules for Matrix Multiplication

There are several useful properties for matrix (and scalar) multiplication

\[(\alpha + \beta)\mathbf{A} = \alpha \mathbf{A} + \beta\mathbf{A}\] \[\alpha (\mathbf{A} +\mathbf{B}) =\alpha \mathbf{A} +\alpha\mathbf{B}\] \[(\alpha\beta) \mathbf{A} =\alpha(\beta \mathbf{A})\] \[\alpha (\mathbf{A}\mathbf{B}) =(\alpha \mathbf{A}) \mathbf{B}\] \[(\mathbf{A}\mathbf{B} )\mathbf{C} =\mathbf{A}(\mathbf{B} \mathbf{C})\] \[\mathbf{A}(\mathbf{B} +\mathbf{C}) =\mathbf{A}\mathbf{B} +\mathbf{A} \mathbf{C}\] \[(\mathbf{A}+\mathbf{B} )\mathbf{C} =\mathbf{A}\mathbf{C} +\mathbf{B} \mathbf{C}\] \[\mathbf{A}\mathbf{I} =\mathbf{I}\mathbf{A} = \mathbf{A}\] \[\mathbf{A}\mathbf{0} =\mathbf{0}\mathbf{A} = \mathbf{0}\] \[\mathbf{A}\mathbf{B} \neq\mathbf{B}\mathbf{A}\] —

Trace

The trace of a square matrix is the sum of the diagonal elements
If square matrix \(\mathbf{A}\) is

\[\mathbf{A}= \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1n} \\ a_{21}& a_{22} &\cdots & a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{n1}& a_{n2} &\cdots & a_{nn} \end{bmatrix}\]

Then its trace is

\[tr(\mathbf{A})= \sum_{i=1}^{n} a_{ii}\]

Trace

Important properties of the trace are

\[tr(\mathbf{I_{n}})= n\] \[tr(\mathbf{A}')=tr(\mathbf{A})\] \[tr(\mathbf{A +B})=tr(\mathbf{A}) + tr(\mathbf{B})\] \[tr(\alpha \mathbf{A})=\alpha tr(\mathbf{A})\] \[tr(\mathbf{AB})=tr(\mathbf{BA})\]

Marix Determinant

The determinant is a scalar value associated with a square matrix
- Helpful concept for several things in matrix algebra
- For econometrics, most useful for solving systems of equations and finding inverse of a matrix
For \(2 \times 2\) matrix \(\mathbf{A}\) \[\mathbf{A}= \begin{bmatrix} a_{11}& a_{12} \\ a_{21}& a_{22} \\ \end{bmatrix}\]
The determinant is

\[|\mathbf{A}|=a_{11}a_{22} - a_{12}a_{21}\]

Marix Determinant

For \(3 \times 3\) matrix \(\mathbf{A}\)

\[\mathbf{A}= \begin{bmatrix} a_{11}& a_{12} & a_{13} \\ a_{21}& a_{22} & a_{23} \\ a_{31}& a_{32} & a_{33} \\ \end{bmatrix}\]

The determinant is

\[|\mathbf{A}|=a_{11}a_{22}a_{33} + a_{12}a_{23}a_{31} +a_{13}a_{21}a_{32}\] \[-(a_{12}a_{21}a_{33} + a_{11}a_{23}a_{32} +a_{13}a_{22}a_{31})\]

\[=a_{11}(a_{22}a_{33} - a_{23}a_{32}) + a_{12}(a_{23}a_{31} -a_{21}a_{33} ) +a_{13}(a_{21}a_{32} - a_{22}a_{31} )\]

Marix Determinant

For \(n \times n\) matrix \(\mathbf{A}\) the determinant is

\[|\mathbf{A}|=a_{i1}c_{i1} + a_{i2}c_{i2} + \cdots + a_{in}c_{in} \text{ for choice of any row i}\]

Where
- \(a_{ij}\) is the \(ij\) element of matrix \(\mathbf{A}\)
- \(c_{ij}\) is the \(ij\) cofactor of matrix \(\mathbf{A}\) defined as \[c_{ij} = (-1)^{i+j}|\mathbf{A}_{ij}|\]
- \(|\mathbf{A}_{ij}|\) is the minor of matrix \(\mathbf{A}\)
  - Determinant of the sub-matrix formed by deleting the \(i\)th row and \(j\)th column of \(\mathbf{A}\)
Process is long and tedious for large matrices

Marix Determinant

Example of \(3 \times 3\) matrix

\[\mathbf{A}= \begin{bmatrix} 1& 2 & 3 \\ 4& 5&6 \\ 7& 8 &9 \end{bmatrix}\]

Choose any row to find cofactors and compute determinant
- Does not matter which
Let us expand along row 1

\[|\mathbf{A}|=1(-1)^{1+1} \begin{vmatrix} 5&6 \\ 8 &9 \end{vmatrix} +2(-1)^{1+2} \begin{vmatrix} 4&6 \\ 7 &9 \end{vmatrix} +3(-1)^{1+3} \begin{vmatrix} 4&5 \\ 7 &8 \end{vmatrix}\]

\[|\mathbf{A}|= -3 +12 -9 = 0\]

Matrix Inverse

The inverse of a square matrix \(\mathbf{A}\) is defined such that

\[\mathbf{A}\mathbf{A}^{-1} = \mathbf{A}^{-1}\mathbf{A} = \mathbf{I}\]

It is roughly the equivalent of taking the reciprocal in scalar math
- But it is not generally the reciprocal of the elements of a matrix
The formula for the inverse is

\[\mathbf{A}^{-1}= \frac{1}{|\mathbf{A}|} \begin{bmatrix} c_{11}& c_{12} &\cdots & c_{1n} \\ c_{21}& c_{22} &\cdots & c_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ c_{n1}& c_{n2} &\cdots & c_{nn} \end{bmatrix}\]

where \(c_{ij}\) are the cofactors defined above

Matrix Inverse

The inverse exists only when \(|\mathbf{A}| \neq 0\)
- This is why it is important to know the determinant
- In example above, inverse does not exist
  - We will see later that it is because the columns are linearly dependent
A matrix that cannot be inverted is singular
A matrix that has an inverse is nonsingular
Inverse matrices have the following properties

\[\mathbf{(\alpha A)^{-1} = \frac{1}{\alpha} A^{-1} }\] \[\mathbf{(A')^{-1}} = \mathbf{(A^{-1})' }\] \[\mathbf{(A^{-1})^{-1}} = \mathbf{A}\] \[\mathbf{(AB)^{-1}= B^{-1}A^{-1} }\]

Linear Independence and Rank of a Matrix

Summary

Now that we can manipulate matrices, we can move to more advanced topics
Matrix algebra is useful for expressing and solving systems of equations
- This is how we will use it in econometrics
We will learn you can solve for the OLS estimator when regressors are linearly independent
- They are not linear functions of one another
To check linear independence, we use the concept of rank
The rank of a matrix is the maximum number of independent rows or columns
- For non-square matrices, the maximum rank is the lesser of the number or rows or columns

Linear Independence

A set of vectors are linearly independent if you cannot express any of them as linear functions the others
Mathematically, suppose that \(\mathbf{A}=\begin{bmatrix} \mathbf{a}_{1}& \mathbf{a}_{2} &\cdots & \mathbf{a}_{m} \end{bmatrix}\)
- where \(\mathbf{a}_{1}, \mathbf{a}_{2}, \cdots,\mathbf{a}_{m}\) are \(n \times 1\) vectors
The vectors are independent if the only solution to

\[\alpha_{1}\mathbf{a}_{1}+ \alpha_{2}\mathbf{a}_{2}+ \cdots+\alpha_{m}\mathbf{a}_{m}= 0\]

\[\alpha_{1} = \alpha_{2}= \cdots=\alpha_{m}= 0\] - If at least one \(\alpha_{i} \neq 0\), then the vectors are linearly dependent

Rank of a Matrix

The rank of a matrix is the maximum number of linearly independent rows or columns
- The rank of the rows will always equal the rank of the columns
- If the number of rows is less than columns, the highest rank is the number of rows
- Vice versa if the number of columns is less than the number of rows
A matrix has full rank if rank equals the minimum of the number of rows/columns
In econometrics, we mostly deal with matrices with more rows than columns
- So the matrix will be full rank if the rank equals the number of columns
We will see later we need our matrix of regressors to have full rank
- None of the regressors can be linear functions of each other (no multicollinearity)

Rank of a Matrix

Some useful properties of the rank of a matrix
- The rank of a matrix and transpose are the same \[rank(\mathbf{A'}) = rank(\mathbf{A})\]
- If \(\mathbf{A}\) is \(n \times m\) then \[rank(\mathbf{A}) \le min(n,m)\]
- If \(\mathbf{A}\) is \(n \times n\) and \(rank(\mathbf{A}) =n\) then \(\mathbf{A}\) is nonsingular (invertible)

Quadratic Forms and Positive Definite Matrices

Quadratic Form

If \(\mathbf{A}\) is \(n \times n\) and symmetric, and \(\mathbf{x}\) is \(n \times 1\), the quadratic form for \(\mathbf{A}\) is

\[\mathbf{x'Ax}= \begin{bmatrix} x_{1}& x_{2} &\cdots & x_{n} \end{bmatrix} \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1n} \\ a_{21}& a_{22} &\cdots & a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{n1}& a_{n2} &\cdots & a_{nn} \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2} \\ \vdots \\ x_{n} \end{bmatrix}\]

\[=\sum_{i=1}^n a_{ii}x_{i}^2 + 2\sum_{i=1}^n \sum_{j>i}a_{ij}x_{i}x_{j}\]

A matrix is positive definite if for all \(\mathbf{x} \neq 0\)

\[\mathbf{x'Ax} > 0\]

Positive Definite Matrices

A matrix is positive semidefinite if for all \(\mathbf{x} \neq 0\)

\[\mathbf{x'Ax} \ge 0\] - Positive definite matrices have diagonal elements that are strictly positive

Positive semidefinite matrices have diagonal elements that are nonnegative
Some other useful properties of positive definite/semidefinite matrices
- If \(\mathbf{A}\) is positive definite, then \(\mathbf{A}^{-1}\) exists and is also positive definite
- If \(\mathbf{A}\) is \(n \times m\), then \(\mathbf{A'A}\) and \(\mathbf{AA'}\) are positive definite
- If \(\mathbf{A}\) is \(n \times m\) and \(rank(\mathbf{A}) = m\) then \(\mathbf{A'A}\) is positive definite
These concepts are used mostly for variance-covariance matrices in econometrics

Idempotent Matrices

An idempotent matrix is one that does not change when multiplied by itself
Mathematically, \(\mathbf{A}\) is idempotent when

\[\mathbf{AA} = \mathbf{A}\]

When we discuss OLS, we will work with the following idempotent matrices
- Suppose \(\mathbf{X}\) is \(n \times k\) with full rank. Define

\[\mathbf{P} = \mathbf{X(X'X)^{-1}X'}\] \[\mathbf{M} =\mathbf{I_{n}} - \mathbf{X(X'X)^{-1}X'}\]

You can verify they are idempotent my multiplying each by itself
Some important properties of idempotent matrices are
- \(rank(\mathbf{A}) = tr(\mathbf{A})\)
- \(\mathbf{A}\) is positive semidefinite

Moments of Random Vectors

Expected Value

The expected value of a random matrix is the matrix of expected values
If \(\mathbf{X}\) is an \(n \times m\) matrix, then

\[\mathbf{E}(\mathbf{X})= \begin{bmatrix} \mathbf{E}(x_{11}) & \mathbf{E}(x_{12}) & \cdots & \mathbf{E}(x_{1m})\\ \mathbf{E}(x_{21}) & \mathbf{E}(x_{22}) & \cdots &\mathbf{E}(x_{2m})\\ \vdots & \vdots &\ddots & \vdots \\ \mathbf{E}(x_{n1}) & \mathbf{E}(x_{n2}) & \cdots &\mathbf{E}(x_{nm})\\ \end{bmatrix}\]

Properties of expected values are similar to those in scalar math
- If \(\mathbf{x}\) is a random vector, \(\mathbf{b}\) is a nonrandom vector, and \(\mathbf{A}\) is a nonrandom matrix, then \(\mathbf{E}(\mathbf{Ax+b}) = \mathbf{A}\mathbf{E}(\mathbf{x})+\mathbf{b}\)
- If \(\mathbf{X}\) is a random matrix, and \(\mathbf{B}\) and \(\mathbf{A}\) are nonrandom matrices, then \(\mathbf{E}(\mathbf{AXB}) = \mathbf{A}\mathbf{E}(\mathbf{X})\mathbf{B}\)

Variance-Covariance Matrix

The variance-covariance matrix of random vector \(\mathbf{y}\) has variances on the diagonal, covariances in the off-diagonal
If \(\mathbf{y}\) is an \(n \times 1\) random vector, then

\[var(\mathbf{y})= \mathbf{\sigma_{y}} = \mathbf{E[(y-E[y])(y-E[y])']}\] \[= \begin{bmatrix} \text{var}(y_{1}) & \text{cov}(y_{1},y_{2}) & \cdots &\text{cov}(y_{1},y_{n}) \\ \text{cov}(y_{2},y_{1}) & \text{var}(y_{2}) & \cdots &\text{cov}(y_{2},y_{n}) \\ \vdots & \vdots &\ddots & \vdots \\ \text{cov}(y_{n},y_{1}) & \text{cov}(y_{n},y_{2}) & \cdots &\text{var}(y_{n})\\ \end{bmatrix}\]

Variance-Covariance Matrix

Useful properties of variance-covariance matrices are
- If \(\mathbf{a}\) is a nonrandom vector, then \(\text{var}(\mathbf{a'y}) =\mathbf{a'}\text{var}\mathbf{(y)a}\)
- If \(\text{var}(\mathbf{a'y})>0\) for all \(\mathbf{a>0}\), \(\text{var}(\mathbf{y})\) is positive definite
- If \(\mathbf{A}\) is a nonrandom matrix, \(\mathbf{b}\) is a nonrandom vector, then \(\text{var}(\mathbf{Ay + b}) =\mathbf{A'}\text{var}\mathbf{(y)A}\)
- If \(\text{var}(y_{j})=\sigma^{2}\) for all \(j=1,2,...,n\), and the elements of \(\textbf{y}\) are uncorrelated, then \(\text{var}(\mathbf{y})=\sigma^{2}\mathbf{I_{n}}\)

Matrix Differentiation

Scalar Functions

A scalar function of a vector is a single function with respect to several variables
- A vector function is a set of one or more scalar functions, each with respect to several variables
- We will not cover these
Consider the scalar function \(y = f(\mathbf{x}) =f(x_{1}, x_{2},...,x_{n})\)
- The function takes the vector \(\mathbf{x}\) and returns a scalar
- This is just another way to write a multivariate function
The derivative of this function is

\[\frac{\partial f(\mathbf{x})}{\mathbf{x}}= \begin{bmatrix} \frac{\partial f(\mathbf{x})}{x_{1}} & \frac{\partial f(\mathbf{x})}{x_{2}} & \cdots & \frac{\partial f(\mathbf{x})}{x_{n}} \end{bmatrix}\]

Derivative of Scalar Function

We simply collect the derivative with respect to each element of \(\mathbf{x}\) in a vector
Ex: linear function of \(\mathbf{x}\)
- Suppose \(\mathbf{a}\) is an \(n \times 1\) vector and \[y = f(\mathbf{x}) = \mathbf{a'x} = \sum_{i=1}^{n} a_{i}x_{i}\]
- The derivative is
\[\frac{\partial f(\mathbf{x})}{\partial \mathbf{x}}=\frac{\partial \mathbf{a'x} }{\partial \mathbf{x}}= \mathbf{a'} = \begin{bmatrix} a_{1}& a_{2}& \cdots & a_{n} \end{bmatrix}\]

Derivative of Scalar Function

Ex: Quadratic form of \(\mathbf{x}\)
- Suppose \(\mathbf{A}\) is an \(n \times n\) symmetric matrix. The quadratic form is \[y = f(\mathbf{x}) = \mathbf{x'Ax} =\sum_{i=1}^n a_{ii}x_{i}^2 + 2\sum_{i=1}^n \sum_{j>i}a_{ij}x_{i}x_{j}\]
- The derivative is \[\frac{\partial f(\mathbf{x})}{\partial \mathbf{x}}=\frac{\partial \mathbf{x'Ax} }{\partial \mathbf{x}}= \mathbf{2x'A}\]

Linear Regression Model in Matrix Notation

Population Regression Model

In undergraduate textbooks, the population linear regression model is written as

\[y= \beta_{0} + \beta_{1}x_{1} + \beta_{2}x_{2} + \cdots + \beta_{k}x_{k} + u\]

\(y\) and \(x_{1},...,x_{k}\) are observable random variables
\(u\) is an unobservable random variable
We can write more compactly in vector form as

\[y= \mathbf{x}\boldsymbol{\beta} + u\]

\(\mathbf{x}\) is a \(1 \times (k+1)\) vector of independent variables
- There are \(k\) independent variables, plus an intercept
\(\boldsymbol{\beta}\) is a \((k+1) \times 1\) vector of slope parameters

Population Regression Model

Now suppose we take a random sample of \(n\) people from the population
The population model holds for each member of the sample

\[y_{i}= \mathbf{x_{i}}\boldsymbol{\beta} + u_{i}, \forall i=1,...,n\]

We can express this more compactly with full matrix notation

\[\mathbf{y}= \mathbf{X}\boldsymbol{\beta} + \mathbf{u}\]

\(\mathbf{X}\) is an \(n \times (k+1)\) matrix of observations on each regressor
\(\boldsymbol{\beta}\) is still a \((k+1) \times 1\) vector of slope parameters
\(\mathbf{y}\) is an \(n \times 1\) vector of observations on the dependent variable
\(\mathbf{u}\) is an \(n \times 1\) vector of error terms