Matrix Review


Justin Smith

Wilfrid Laurier University

Fall 2022



  • Undergrad metrics normally uses scalar notation

    • More accessible for students without advanced math background
  • At the graduate level, it is often taught using matrix algebra

  • Some advantages to matrix notation

    • More compact

    • Easier to express some estimators

  • In this section, we review matrix algebra essentials for econometrics

    • Not a comprehensive review
  • We will switch between scalar and matrix notation in the course

    • Depending on which is clearer in each context

Matrices and Vectors


  • A matrix is a rectangular array of numbers organized in rows and columns

  • For example, matrix \(\mathbf{A}\) with 2 rows and 3 columns could be

\[\mathbf{A} = \begin{bmatrix} 1 & 2 & 3 \\ 4 &5 & 6 \end{bmatrix}\]

  • More generally, matrix \(\mathbf{A}\) with m rows and n columns is

\[\mathbf{A}= \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1n} \\ a_{21}& a_{22} &\cdots & a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1}& a_{m2} &\cdots & a_{mn} \end{bmatrix}\]


  • A vector is a matrix with one column or one row

  • A row vector \(\mathbf{a}\) with n elements is

\[\mathbf{a}= \begin{bmatrix} a_{1}& a_{2} &\cdots & a_{n} \end{bmatrix}\]

  • A .red[column vector] \(\mathbf{a}\) with m elements is

\[\mathbf{a}= \begin{bmatrix} a_{1}\\ a_{2}\\ \vdots \\ a_{m} \end{bmatrix}\]

Special Matrices

  • A Square Matrix has the same number of rows and columns

\[\mathbf{A}= \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1m} \\ a_{21}& a_{22} &\cdots & a_{2m} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1}& a_{m2} &\cdots & a_{mm} \end{bmatrix}\]

  • A Diagonal Matrix is a square matrix with zeroes for all off-diagonal elements

\[\mathbf{A}= \begin{bmatrix} a_{11}& 0&\cdots & 0 \\ 0& a_{22} &\cdots & 0 \\ \vdots & \vdots &\ddots & \vdots \\ 0& 0&\cdots & a_{mm} \end{bmatrix}\]

Special Matrices

  • The Identity Matrix is a square matrix with ones on the diagonal and zeroes on the off-diagonals

\[\mathbf{I}= \begin{bmatrix} 1& 0&\cdots & 0 \\ 0& 1 &\cdots & 0 \\ \vdots & \vdots &\ddots & \vdots \\ 0& 0&\cdots & 1 \end{bmatrix}\]

  • The Zero Matrix is a matrix with zeroes for all elements

\[\mathbf{0}= \begin{bmatrix} 0& 0&\cdots & 0 \\ 0& 0 &\cdots & 0 \\ \vdots & \vdots &\ddots & \vdots \\ 0& 0&\cdots & 0 \end{bmatrix}\]

Matrix Addition

  • You can add and subtract matrices with the same dimensions

    • Matrices with different dimensions are not conformable for addition or subtraction
  • The sum of matrices \(\mathbf{A}\) and \(\mathbf{B}\) with dimension \(m \times n\) is

\[\mathbf{A} + \mathbf{B}= \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1n} \\ a_{21}& a_{22} &\cdots & a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1}& a_{m2} &\cdots & a_{mn} \end{bmatrix} + \begin{bmatrix} b_{11}& b_{12} &\cdots & b_{1n} \\ b_{21}& b_{22} &\cdots & b_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ b_{m1}& b_{m2} &\cdots & b_{mn} \end{bmatrix}\]

\[= \begin{bmatrix} a_{11} + b_{11}& a_{12} + b_{12} &\cdots & a_{1n}+ b_{1n} \\ a_{21} + b_{21}& a_{22} + b_{22} &\cdots & a_{2n}+ b_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1} + b_{m1}& a_{m2} +b_{m2} &\cdots & a_{mn}+ b_{mn} \\ \end{bmatrix}\]

Matrix Subtraction

  • Similarly, the difference between matrices \(\mathbf{A}\) and \(\mathbf{B}\) is

\[\mathbf{A} - \mathbf{B}= \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1n} \\ a_{21}& a_{22} &\cdots & a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1}& a_{m2} &\cdots & a_{mn} \end{bmatrix} - \begin{bmatrix} b_{11}& b_{12} &\cdots & b_{1n} \\ b_{21}& b_{22} &\cdots & b_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ b_{m1}& b_{m2} &\cdots & b_{mn} \end{bmatrix}\]

\[= \begin{bmatrix} a_{11} - b_{11}& a_{12} - b_{12} &\cdots & a_{1n}- b_{1n} \\ a_{21} - b_{21}& a_{22} - b_{22} &\cdots & a_{2n}- b_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1} - b_{m1}& a_{m2} -b_{m2} &\cdots & a_{mn}- b_{mn} \\ \end{bmatrix}\]

Rules for Addition and Subtraction

  • The following rules apply to matrix addition and subtraction

    • Commutativity \[\mathbf{A + B = B + A}\]

    • Associativity \[\mathbf{A + (B + C) = (A+B) + C}\]

  • Effectively, both rules mean order does not matter

    • Similar to scalar math
  • For subtraction, replace plus sign with minus sign and same rules apply

Matrix Multiplication

  • To multiply matrix \(\mathbf{A}\) and \(\mathbf{B}\), the number of columns in \(\mathbf{A}\) must equal the number of rows in \(\mathbf{B}\)

  • Suppose matrix \(\mathbf{A}\) is \(m \times n\) and matrix \(\mathbf{B}\) is \(n \times p\)

  • Define product as \(\mathbf{C}\)= \(\mathbf{AB}\)

    • The \(ij\) element of \(\mathbf{C}\) is the sum of the product of the corresponding elements along the \(i\)th row of \(\mathbf{A}\) and \(j\)th column of \(\mathbf{B}\)
      \[c_{ij} = \sum_{k} a_{ik}b_{kj}\]

    • The product matrix \(\mathbf{C}\) will have dimension \(m \times p\)

      • The number of rows of \(\textbf{A}\) and number of columns of \(\textbf{B}\)

Matrix Multiplication

  • The product \(\mathbf{AB}\) is

\[\mathbf{AB}= \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1n} \\ a_{21}& a_{22} &\cdots & a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1}& a_{m2} &\cdots & a_{mn} \end{bmatrix} \times \begin{bmatrix} b_{11}& b_{12} &\cdots & b_{1p} \\ b_{21}& b_{22} &\cdots & b_{2p} \\ \vdots & \vdots &\ddots & \vdots \\ b_{n1}& b_{n2} &\cdots & b_{np} \end{bmatrix}\]

\[= \begin{bmatrix} a_{11} b_{11} + a_{12} b_{21} + \cdots + a_{1n} b_{n1} &a_{11} b_{12} + a_{12} b_{22} + \cdots + a_{1n} b_{n2} &\cdots&a_{11} b_{1p} + a_{12} b_{2p} + \cdots + a_{1n} b_{np}\\ a_{21} b_{11} + a_{22} b_{21} + \cdots + a_{2n} b_{n1} &a_{21} b_{12} + a_{22} b_{22} + \cdots + a_{2n} b_{n2} &\cdots&a_{21} b_{1p} + a_{22} b_{2p} + \cdots + a_{2n} b_{np}\\ \vdots &\ddots & \vdots \\ a_{m1} b_{11} + a_{m2} b_{21} + \cdots + a_{mn} b_{n1} &a_{m1} b_{12} + a_{m2} b_{22} + \cdots + a_{mn} b_{n2} &\cdots&a_{m1} b_{1p} + a_{m2} b_{2p} + \cdots + a_{mn} b_{np}\\ \end{bmatrix}\]

Matrix Multiplication

  • As an illustration suppose we have the following matrices \[\mathbf{A}= \begin{bmatrix} 1& 2\\ 3& 4 \\ \end{bmatrix} \mathbf{B}= \begin{bmatrix} 5&6&7 \\ 8&9 &10 \end{bmatrix}\]

  • We can multiply \(\mathbf{AB}\) because \(\mathbf{A}\) has 2 columns, and \(\mathbf{B}\) has 2 rows

  • The product \(\mathbf{C}\) = \(\mathbf{AB}\) is

\[\mathbf{C}= \begin{bmatrix} 1& 2\\ 3& 4 \\ \end{bmatrix} \times \begin{bmatrix} 5&6&7 \\ 8&9 &10 \end{bmatrix} = \begin{bmatrix} 1 \times 5 + 2\times 8&1 \times 6 + 2 \times 9 & 1 \times 7 + 2 \times 10 \\ 3 \times 5 + 4\times 8&3 \times 6 + 4 \times 9 & 3 \times 7 + 4 \times 10 \end{bmatrix}\]

\[= \begin{bmatrix} 21& 24& 27 \\ 47&54& 61 \end{bmatrix}\]

Scalar Multiplication

  • A scalar is a single real number

  • You can also multiply a scalar by a matrix

  • If \(\gamma\) is a scalar, and \(\mathbf{A}\) is a matrix, then

\[\mathbf{\gamma A}= \gamma \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1n} \\ a_{21}& a_{22} &\cdots & a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1}& a_{m2} &\cdots & a_{mn} \end{bmatrix} = \begin{bmatrix} \gamma a_{11}&\gamma a_{12} &\cdots & \gamma a_{1n} \\ \gamma a_{21}& \gamma a_{22} &\cdots & \gamma a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ \gamma a_{m1}& \gamma a_{m2} &\cdots & \gamma a_{mn} \end{bmatrix}\]

  • You multiply the scalar by each element of the matrix


  • The transpose of a matrix is one where the rows and columns are switched

  • Suppose matrix \(\mathbf{A}\) is

\[\mathbf{A}= \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1n} \\ a_{21}& a_{22} &\cdots & a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1}& a_{m2} &\cdots & a_{mn} \end{bmatrix}\]

  • Then its transpose \(\mathbf{A'}\) is

\[\mathbf{A'}= \begin{bmatrix} a_{11}& a_{21} &\cdots & a_{m1} \\ a_{12}& a_{22} &\cdots & a_{m2} \\ \vdots & \vdots &\ddots & \vdots \\ a_{1n}& a_{2n} &\cdots & a_{mn} \end{bmatrix}\]


  • The transpose has the following properties

\[\mathbf{(A')' = A }\] \[\mathbf{(\alpha A)' = \alpha A' }\] \[\mathbf{(A + B)' = A' + B' }\] \[\mathbf{(AB)' = B'A' }\]

  • There are additional rules for different types of matrices that we will cover below

Partitioned Matrix Multiplication

  • You may sometimes want to break matrices into vectors before you multiply

  • Multiplication works the same way, but notation can be cleaner and more intuitive

  • Suppose we have the following matrices \[\mathbf{A}= \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1n} \\ a_{21}& a_{22} &\cdots & a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1}& a_{m2} &\cdots & a_{mn} \end{bmatrix} \mathbf{B}= \begin{bmatrix} b_{11}& b_{12} &\cdots & b_{1p} \\ b_{21}& b_{22} &\cdots & b_{2p} \\ \vdots & \vdots &\ddots & \vdots \\ b_{n1}& b_{n2} &\cdots & b_{np} \end{bmatrix}\]

  • We are interested in the product \(\mathbf{AB}\)

Partitioned Matrix Multiplication

  • Break these matrices into vectors conformable for multiplication

\[\mathbf{A}= \begin{bmatrix} \mathbf{a_{1}}&\mathbf{a_{2}} & \cdots & \mathbf{a_{n}} \end{bmatrix} \mathbf{B}= \begin{bmatrix} \mathbf{b_{1}}\\ \mathbf{b_{2} }\\ \vdots \\ \mathbf{b_{n}} \end{bmatrix}\]

  • Where

\[\mathbf{a_{1}}= \begin{bmatrix} a_{11}\\ a_{21}\\ \cdots\\ a_{m1} \end{bmatrix} \mathbf{b_{1}}= \begin{bmatrix} b_{11}&b_{12} & \cdots & b_{1p} \end{bmatrix}\]

Partitioned Matrix Multiplication

  • Multiply the vectors to get

\[\mathbf{AB} = \sum_{i=1}^{n} \mathbf{a_{i}b_{i}}\]

  • This breaks the product \(\mathbf{AB}\) into the sum of \(n\) sub-matrices

    • Each sub-matrix is product of corresponding vectors

    • Also each sub-matrix will have dimension \(m \times p\)

  • This will be useful for some econometric estimators we derive

    • Makes notation simpler and more intuitive
  • Again, note that you get the same answer as doing straight matrix multiplication

Rules for Matrix Multiplication

  • There are several useful properties for matrix (and scalar) multiplication

\[(\alpha + \beta)\mathbf{A} = \alpha \mathbf{A} + \beta\mathbf{A}\] \[\alpha (\mathbf{A} +\mathbf{B}) =\alpha \mathbf{A} +\alpha\mathbf{B}\] \[(\alpha\beta) \mathbf{A} =\alpha(\beta \mathbf{A})\] \[\alpha (\mathbf{A}\mathbf{B}) =(\alpha \mathbf{A}) \mathbf{B}\] \[(\mathbf{A}\mathbf{B} )\mathbf{C} =\mathbf{A}(\mathbf{B} \mathbf{C})\] \[\mathbf{A}(\mathbf{B} +\mathbf{C}) =\mathbf{A}\mathbf{B} +\mathbf{A} \mathbf{C}\] \[(\mathbf{A}+\mathbf{B} )\mathbf{C} =\mathbf{A}\mathbf{C} +\mathbf{B} \mathbf{C}\] \[\mathbf{A}\mathbf{I} =\mathbf{I}\mathbf{A} = \mathbf{A}\] \[\mathbf{A}\mathbf{0} =\mathbf{0}\mathbf{A} = \mathbf{0}\] \[\mathbf{A}\mathbf{B} \neq\mathbf{B}\mathbf{A}\]


  • The trace of a square matrix is the sum of the diagonal elements

  • If square matrix \(\mathbf{A}\) is

\[\mathbf{A}= \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1n} \\ a_{21}& a_{22} &\cdots & a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{n1}& a_{n2} &\cdots & a_{nn} \end{bmatrix}\]

  • Then its trace is

\[tr(\mathbf{A})= \sum_{i=1}^{n} a_{ii}\]


  • Important properties of the trace are

\[tr(\mathbf{I_{n}})= n\] \[tr(\mathbf{A}')=tr(\mathbf{A})\] \[tr(\mathbf{A +B})=tr(\mathbf{A}) + tr(\mathbf{B})\] \[tr(\alpha \mathbf{A})=\alpha tr(\mathbf{A})\] \[tr(\mathbf{AB})=tr(\mathbf{BA})\]

Marix Determinant

  • The determinant is a scalar value associated with a square matrix

    • Helpful concept for several things in matrix algebra

    • For econometrics, most useful for solving systems of equations and finding inverse of a matrix

  • For \(2 \times 2\) matrix \(\mathbf{A}\) \[\mathbf{A}= \begin{bmatrix} a_{11}& a_{12} \\ a_{21}& a_{22} \\ \end{bmatrix}\]

  • The determinant is

\[|\mathbf{A}|=a_{11}a_{22} - a_{12}a_{21}\]

Marix Determinant

  • For \(3 \times 3\) matrix \(\mathbf{A}\)

\[\mathbf{A}= \begin{bmatrix} a_{11}& a_{12} & a_{13} \\ a_{21}& a_{22} & a_{23} \\ a_{31}& a_{32} & a_{33} \\ \end{bmatrix}\]

  • The determinant is

\[|\mathbf{A}|=a_{11}a_{22}a_{33} + a_{12}a_{23}a_{31} +a_{13}a_{21}a_{32}\] \[-(a_{12}a_{21}a_{33} + a_{11}a_{23}a_{32} +a_{13}a_{22}a_{31})\]

\[=a_{11}(a_{22}a_{33} - a_{23}a_{32}) + a_{12}(a_{23}a_{31} -a_{21}a_{33} ) +a_{13}(a_{21}a_{32} - a_{22}a_{31} )\]

Marix Determinant

  • For \(n \times n\) matrix \(\mathbf{A}\) the determinant is

\[|\mathbf{A}|=a_{i1}c_{i1} + a_{i2}c_{i2} + \cdots + a_{in}c_{in} \text{ for choice of any row i}\]

  • Where

    • \(a_{ij}\) is the \(ij\) element of matrix \(\mathbf{A}\)

    • \(c_{ij}\) is the \(ij\) cofactor of matrix \(\mathbf{A}\) defined as \[c_{ij} = (-1)^{i+j}|\mathbf{A}_{ij}|\]

    • \(|\mathbf{A}_{ij}|\) is the minor of matrix \(\mathbf{A}\)

      • Determinant of the sub-matrix formed by deleting the \(i\)th row and \(j\)th column of \(\mathbf{A}\)
  • Process is long and tedious for large matrices

Marix Determinant

  • Example of \(3 \times 3\) matrix

\[\mathbf{A}= \begin{bmatrix} 1& 2 & 3 \\ 4& 5&6 \\ 7& 8 &9 \end{bmatrix}\]

  • Choose any row to find cofactors and compute determinant

    • Does not matter which
  • Let us expand along row 1

\[|\mathbf{A}|=1(-1)^{1+1} \begin{vmatrix} 5&6 \\ 8 &9 \end{vmatrix} +2(-1)^{1+2} \begin{vmatrix} 4&6 \\ 7 &9 \end{vmatrix} +3(-1)^{1+3} \begin{vmatrix} 4&5 \\ 7 &8 \end{vmatrix}\]

\[|\mathbf{A}|= -3 +12 -9 = 0\]

Matrix Inverse

  • The inverse of a square matrix \(\mathbf{A}\) is defined such that

\[\mathbf{A}\mathbf{A}^{-1} = \mathbf{A}^{-1}\mathbf{A} = \mathbf{I}\]

  • It is roughly the equivalent of taking the reciprocal in scalar math

    • But it is not generally the reciprocal of the elements of a matrix
  • The formula for the inverse is

\[\mathbf{A}^{-1}= \frac{1}{|\mathbf{A}|} \begin{bmatrix} c_{11}& c_{12} &\cdots & c_{1n} \\ c_{21}& c_{22} &\cdots & c_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ c_{n1}& c_{n2} &\cdots & c_{nn} \end{bmatrix}\]

  • where \(c_{ij}\) are the cofactors defined above

Matrix Inverse

  • The inverse exists only when \(|\mathbf{A}| \neq 0\)

    • This is why it is important to know the determinant

    • In example above, inverse does not exist

      • We will see later that it is because the columns are linearly dependent
  • A matrix that cannot be inverted is singular

  • A matrix that has an inverse is nonsingular

  • Inverse matrices have the following properties

\[\mathbf{(\alpha A)^{-1} = \frac{1}{\alpha} A^{-1} }\] \[\mathbf{(A')^{-1}} = \mathbf{(A^{-1})' }\] \[\mathbf{(A^{-1})^{-1}} = \mathbf{A}\] \[\mathbf{(AB)^{-1}= B^{-1}A^{-1} }\]

Linear Independence and Rank of a Matrix


  • Now that we can manipulate matrices, we can move to more advanced topics

  • Matrix algebra is useful for expressing and solving systems of equations

    • This is how we will use it in econometrics
  • We will learn you can solve for the OLS estimator when regressors are linearly independent

    • They are not linear functions of one another
  • To check linear independence, we use the concept of rank

  • The rank of a matrix is the maximum number of independent rows or columns

    • For non-square matrices, the maximum rank is the lesser of the number or rows or columns

Linear Independence

  • A set of vectors are linearly independent if you cannot express any of them as linear functions the others

  • Mathematically, suppose that \(\mathbf{A}=\begin{bmatrix} \mathbf{a}_{1}& \mathbf{a}_{2} &\cdots & \mathbf{a}_{m} \end{bmatrix}\)

    • where \(\mathbf{a}_{1}, \mathbf{a}_{2}, \cdots,\mathbf{a}_{m}\) are \(n \times 1\) vectors
  • The vectors are independent if the only solution to

\[\alpha_{1}\mathbf{a}_{1}+ \alpha_{2}\mathbf{a}_{2}+ \cdots+\alpha_{m}\mathbf{a}_{m}= 0\]

  • is

\[\alpha_{1} = \alpha_{2}= \cdots=\alpha_{m}= 0\] - If at least one \(\alpha_{i} \neq 0\), then the vectors are linearly dependent

Rank of a Matrix

  • The rank of a matrix is the maximum number of linearly independent rows or columns

    • The rank of the rows will always equal the rank of the columns

    • If the number of rows is less than columns, the highest rank is the number of rows

    • Vice versa if the number of columns is less than the number of rows

  • A matrix has full rank if rank equals the minimum of the number of rows/columns

  • In econometrics, we mostly deal with matrices with more rows than columns

    • So the matrix will be full rank if the rank equals the number of columns
  • We will see later we need our matrix of regressors to have full rank

    • None of the regressors can be linear functions of each other (no multicollinearity)

Rank of a Matrix

  • Some useful properties of the rank of a matrix

    • The rank of a matrix and transpose are the same \[rank(\mathbf{A'}) = rank(\mathbf{A})\]

    • If \(\mathbf{A}\) is \(n \times m\) then \[rank(\mathbf{A}) \le min(n,m)\]

    • If \(\mathbf{A}\) is \(n \times n\) and \(rank(\mathbf{A}) =n\) then \(\mathbf{A}\) is nonsingular (invertible)

Quadratic Forms and Positive Definite Matrices

Quadratic Form

  • If \(\mathbf{A}\) is \(n \times n\) and symmetric, and \(\mathbf{x}\) is \(n \times 1\), the quadratic form for \(\mathbf{A}\) is

\[\mathbf{x'Ax}= \begin{bmatrix} x_{1}& x_{2} &\cdots & x_{n} \end{bmatrix} \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1n} \\ a_{21}& a_{22} &\cdots & a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{n1}& a_{n2} &\cdots & a_{nn} \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2} \\ \vdots \\ x_{n} \end{bmatrix}\]

\[=\sum_{i=1}^n a_{ii}x_{i}^2 + 2\sum_{i=1}^n \sum_{j>i}a_{ij}x_{i}x_{j}\]

  • A matrix is positive definite if for all \(\mathbf{x} \neq 0\)

\[\mathbf{x'Ax} > 0\]

Positive Definite Matrices

  • A matrix is positive semidefinite if for all \(\mathbf{x} \neq 0\)

\[\mathbf{x'Ax} \ge 0\] - Positive definite matrices have diagonal elements that are strictly positive

  • Positive semidefinite matrices have diagonal elements that are nonnegative

  • Some other useful properties of positive definite/semidefinite matrices

    • If \(\mathbf{A}\) is positive definite, then \(\mathbf{A}^{-1}\) exists and is also positive definite

    • If \(\mathbf{A}\) is \(n \times m\), then \(\mathbf{A'A}\) and \(\mathbf{AA'}\) are positive definite

    • If \(\mathbf{A}\) is \(n \times m\) and \(rank(\mathbf{A}) = m\) then \(\mathbf{A'A}\) is positive definite

  • These concepts are used mostly for variance-covariance matrices in econometrics

Idempotent Matrices

  • An idempotent matrix is one that does not change when multiplied by itself

  • Mathematically, \(\mathbf{A}\) is idempotent when

\[\mathbf{AA} = \mathbf{A}\]

  • When we discuss OLS, we will work with the following idempotent matrices

    • Suppose \(\mathbf{X}\) is \(n \times k\) with full rank. Define

\[\mathbf{P} = \mathbf{X(X'X)^{-1}X'}\] \[\mathbf{M} =\mathbf{I_{n}} - \mathbf{X(X'X)^{-1}X'}\]

  • You can verify they are idempotent my multiplying each by itself

  • Some important properties of idempotent matrices are

    • \(rank(\mathbf{A}) = tr(\mathbf{A})\)

    • \(\mathbf{A}\) is positive semidefinite

Moments of Random Vectors

Expected Value

  • The expected value of a random matrix is the matrix of expected values

  • If \(\mathbf{X}\) is an \(n \times m\) matrix, then

\[\mathbf{E}(\mathbf{X})= \begin{bmatrix} \mathbf{E}(x_{11}) & \mathbf{E}(x_{12}) & \cdots & \mathbf{E}(x_{1m})\\ \mathbf{E}(x_{21}) & \mathbf{E}(x_{22}) & \cdots &\mathbf{E}(x_{2m})\\ \vdots & \vdots &\ddots & \vdots \\ \mathbf{E}(x_{n1}) & \mathbf{E}(x_{n2}) & \cdots &\mathbf{E}(x_{nm})\\ \end{bmatrix}\]

  • Properties of expected values are similar to those in scalar math

    • If \(\mathbf{x}\) is a random vector, \(\mathbf{b}\) is a nonrandom vector, and \(\mathbf{A}\) is a nonrandom matrix, then \(\mathbf{E}(\mathbf{Ax+b}) = \mathbf{A}\mathbf{E}(\mathbf{x})+\mathbf{b}\)

    • If \(\mathbf{X}\) is a random matrix, and \(\mathbf{B}\) and \(\mathbf{A}\) are nonrandom matrices, then \(\mathbf{E}(\mathbf{AXB}) = \mathbf{A}\mathbf{E}(\mathbf{X})\mathbf{B}\)

Variance-Covariance Matrix

  • The variance-covariance matrix of random vector \(\mathbf{y}\) has variances on the diagonal, covariances in the off-diagonal

  • If \(\mathbf{y}\) is an \(n \times 1\) random vector, then

\[var(\mathbf{y})= \mathbf{\sigma_{y}} = \mathbf{E[(y-E[y])(y-E[y])']}\] \[= \begin{bmatrix} \text{var}(y_{1}) & \text{cov}(y_{1},y_{2}) & \cdots &\text{cov}(y_{1},y_{n}) \\ \text{cov}(y_{2},y_{1}) & \text{var}(y_{2}) & \cdots &\text{cov}(y_{2},y_{n}) \\ \vdots & \vdots &\ddots & \vdots \\ \text{cov}(y_{n},y_{1}) & \text{cov}(y_{n},y_{2}) & \cdots &\text{var}(y_{n})\\ \end{bmatrix}\]

Variance-Covariance Matrix

  • Useful properties of variance-covariance matrices are

    • If \(\mathbf{a}\) is a nonrandom vector, then \(\text{var}(\mathbf{a'y}) =\mathbf{a'}\text{var}\mathbf{(y)a}\)

    • If \(\text{var}(\mathbf{a'y})>0\) for all \(\mathbf{a>0}\), \(\text{var}(\mathbf{y})\) is positive definite

    • If \(\mathbf{A}\) is a nonrandom matrix, \(\mathbf{b}\) is a nonrandom vector, then \(\text{var}(\mathbf{Ay + b}) =\mathbf{A'}\text{var}\mathbf{(y)A}\)

    • If \(\text{var}(y_{j})=\sigma^{2}\) for all \(j=1,2,...,n\), and the elements of \(\textbf{y}\) are uncorrelated, then \(\text{var}(\mathbf{y})=\sigma^{2}\mathbf{I_{n}}\)

Matrix Differentiation

Scalar Functions

  • A scalar function of a vector is a single function with respect to several variables

    • A vector function is a set of one or more scalar functions, each with respect to several variables

    • We will not cover these

  • Consider the scalar function \(y = f(\mathbf{x}) =f(x_{1}, x_{2},...,x_{n})\)

    • The function takes the vector \(\mathbf{x}\) and returns a scalar

    • This is just another way to write a multivariate function

  • The derivative of this function is

\[\frac{\partial f(\mathbf{x})}{\mathbf{x}}= \begin{bmatrix} \frac{\partial f(\mathbf{x})}{x_{1}} & \frac{\partial f(\mathbf{x})}{x_{2}} & \cdots & \frac{\partial f(\mathbf{x})}{x_{n}} \end{bmatrix}\]

Derivative of Scalar Function

  • We simply collect the derivative with respect to each element of \(\mathbf{x}\) in a vector

  • Ex: linear function of \(\mathbf{x}\)

    • Suppose \(\mathbf{a}\) is an \(n \times 1\) vector and \[y = f(\mathbf{x}) = \mathbf{a'x} = \sum_{i=1}^{n} a_{i}x_{i}\]

    • The derivative is

    \[\frac{\partial f(\mathbf{x})}{\partial \mathbf{x}}=\frac{\partial \mathbf{a'x} }{\partial \mathbf{x}}= \mathbf{a'} = \begin{bmatrix} a_{1}& a_{2}& \cdots & a_{n} \end{bmatrix}\]

Derivative of Scalar Function

  • Ex: Quadratic form of \(\mathbf{x}\)

    • Suppose \(\mathbf{A}\) is an \(n \times n\) symmetric matrix. The quadratic form is \[y = f(\mathbf{x}) = \mathbf{x'Ax} =\sum_{i=1}^n a_{ii}x_{i}^2 + 2\sum_{i=1}^n \sum_{j>i}a_{ij}x_{i}x_{j}\]

    • The derivative is \[\frac{\partial f(\mathbf{x})}{\partial \mathbf{x}}=\frac{\partial \mathbf{x'Ax} }{\partial \mathbf{x}}= \mathbf{2x'A}\]

Linear Regression Model in Matrix Notation

Population Regression Model

  • In undergraduate textbooks, the population linear regression model is written as

\[y= \beta_{0} + \beta_{1}x_{1} + \beta_{2}x_{2} + \cdots + \beta_{k}x_{k} + u\]

  • \(y\) and \(x_{1},...,x_{k}\) are observable random variables

  • \(u\) is an unobservable random variable

  • We can write more compactly in vector form as

\[y= \mathbf{x}\boldsymbol{\beta} + u\]

  • \(\mathbf{x}\) is a \(1 \times (k+1)\) vector of independent variables

    • There are \(k\) independent variables, plus an intercept
  • \(\boldsymbol{\beta}\) is a \((k+1) \times 1\) vector of slope parameters

Population Regression Model

  • Now suppose we take a random sample of \(n\) people from the population

  • The population model holds for each member of the sample

\[y_{i}= \mathbf{x_{i}}\boldsymbol{\beta} + u_{i}, \forall i=1,...,n\]

  • We can express this more compactly with full matrix notation

\[\mathbf{y}= \mathbf{X}\boldsymbol{\beta} + \mathbf{u}\]

  • \(\mathbf{X}\) is an \(n \times (k+1)\) matrix of observations on each regressor

  • \(\boldsymbol{\beta}\) is still a \((k+1) \times 1\) vector of slope parameters

  • \(\mathbf{y}\) is an \(n \times 1\) vector of observations on the dependent variable

  • \(\mathbf{u}\) is an \(n \times 1\) vector of error terms