Undergrad metrics normally uses scalar notation
At the graduate level, it is often taught using matrix algebra
Some advantages to matrix notation
More compact
Easier to express some estimators
In this section, we review matrix algebra essentials for econometrics
We will switch between scalar and matrix notation in the course
A matrix is a rectangular array of numbers organized in rows and columns
For example, matrix \(\mathbf{A}\) with 2 rows and 3 columns could be
\[\mathbf{A} = \begin{bmatrix} 1 & 2 & 3 \\ 4 &5 & 6 \end{bmatrix}\]
\[\mathbf{A}= \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1n} \\ a_{21}& a_{22} &\cdots & a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1}& a_{m2} &\cdots & a_{mn} \end{bmatrix}\]
A vector is a matrix with one column or one row
A row vector \(\mathbf{a}\) with n elements is
\[\mathbf{a}= \begin{bmatrix} a_{1}& a_{2} &\cdots & a_{n} \end{bmatrix}\]
\[\mathbf{a}= \begin{bmatrix} a_{1}\\ a_{2}\\ \vdots \\ a_{m} \end{bmatrix}\]
\[\mathbf{A}= \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1m} \\ a_{21}& a_{22} &\cdots & a_{2m} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1}& a_{m2} &\cdots & a_{mm} \end{bmatrix}\]
\[\mathbf{A}= \begin{bmatrix} a_{11}& 0&\cdots & 0 \\ 0& a_{22} &\cdots & 0 \\ \vdots & \vdots &\ddots & \vdots \\ 0& 0&\cdots & a_{mm} \end{bmatrix}\]
\[\mathbf{I}= \begin{bmatrix} 1& 0&\cdots & 0 \\ 0& 1 &\cdots & 0 \\ \vdots & \vdots &\ddots & \vdots \\ 0& 0&\cdots & 1 \end{bmatrix}\]
\[\mathbf{0}= \begin{bmatrix} 0& 0&\cdots & 0 \\ 0& 0 &\cdots & 0 \\ \vdots & \vdots &\ddots & \vdots \\ 0& 0&\cdots & 0 \end{bmatrix}\]
You can add and subtract matrices with the same dimensions
The sum of matrices \(\mathbf{A}\) and \(\mathbf{B}\) with dimension \(m \times n\) is
\[\mathbf{A} + \mathbf{B}= \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1n} \\ a_{21}& a_{22} &\cdots & a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1}& a_{m2} &\cdots & a_{mn} \end{bmatrix} + \begin{bmatrix} b_{11}& b_{12} &\cdots & b_{1n} \\ b_{21}& b_{22} &\cdots & b_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ b_{m1}& b_{m2} &\cdots & b_{mn} \end{bmatrix}\]
\[= \begin{bmatrix} a_{11} + b_{11}& a_{12} + b_{12} &\cdots & a_{1n}+ b_{1n} \\ a_{21} + b_{21}& a_{22} + b_{22} &\cdots & a_{2n}+ b_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1} + b_{m1}& a_{m2} +b_{m2} &\cdots & a_{mn}+ b_{mn} \\ \end{bmatrix}\]
\[\mathbf{A} - \mathbf{B}= \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1n} \\ a_{21}& a_{22} &\cdots & a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1}& a_{m2} &\cdots & a_{mn} \end{bmatrix} - \begin{bmatrix} b_{11}& b_{12} &\cdots & b_{1n} \\ b_{21}& b_{22} &\cdots & b_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ b_{m1}& b_{m2} &\cdots & b_{mn} \end{bmatrix}\]
\[= \begin{bmatrix} a_{11} - b_{11}& a_{12} - b_{12} &\cdots & a_{1n}- b_{1n} \\ a_{21} - b_{21}& a_{22} - b_{22} &\cdots & a_{2n}- b_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1} - b_{m1}& a_{m2} -b_{m2} &\cdots & a_{mn}- b_{mn} \\ \end{bmatrix}\]
The following rules apply to matrix addition and subtraction
Commutativity \[\mathbf{A + B = B + A}\]
Associativity \[\mathbf{A + (B + C) = (A+B) + C}\]
Effectively, both rules mean order does not matter
For subtraction, replace plus sign with minus sign and same rules apply
To multiply matrix \(\mathbf{A}\) and \(\mathbf{B}\), the number of columns in \(\mathbf{A}\) must equal the number of rows in \(\mathbf{B}\)
Suppose matrix \(\mathbf{A}\) is \(m \times n\) and matrix \(\mathbf{B}\) is \(n \times p\)
Define product as \(\mathbf{C}\)= \(\mathbf{AB}\)
The \(ij\) element of \(\mathbf{C}\) is the sum of the product of the corresponding elements along the \(i\)th row of \(\mathbf{A}\) and \(j\)th column of \(\mathbf{B}\)
\[c_{ij} = \sum_{k} a_{ik}b_{kj}\]
The product matrix \(\mathbf{C}\) will have dimension \(m \times p\)
\[\mathbf{AB}= \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1n} \\ a_{21}& a_{22} &\cdots & a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1}& a_{m2} &\cdots & a_{mn} \end{bmatrix} \times \begin{bmatrix} b_{11}& b_{12} &\cdots & b_{1p} \\ b_{21}& b_{22} &\cdots & b_{2p} \\ \vdots & \vdots &\ddots & \vdots \\ b_{n1}& b_{n2} &\cdots & b_{np} \end{bmatrix}\]
\[= \begin{bmatrix} a_{11} b_{11} + a_{12} b_{21} + \cdots + a_{1n} b_{n1} &a_{11} b_{12} + a_{12} b_{22} + \cdots + a_{1n} b_{n2} &\cdots&a_{11} b_{1p} + a_{12} b_{2p} + \cdots + a_{1n} b_{np}\\ a_{21} b_{11} + a_{22} b_{21} + \cdots + a_{2n} b_{n1} &a_{21} b_{12} + a_{22} b_{22} + \cdots + a_{2n} b_{n2} &\cdots&a_{21} b_{1p} + a_{22} b_{2p} + \cdots + a_{2n} b_{np}\\ \vdots &\ddots & \vdots \\ a_{m1} b_{11} + a_{m2} b_{21} + \cdots + a_{mn} b_{n1} &a_{m1} b_{12} + a_{m2} b_{22} + \cdots + a_{mn} b_{n2} &\cdots&a_{m1} b_{1p} + a_{m2} b_{2p} + \cdots + a_{mn} b_{np}\\ \end{bmatrix}\]
As an illustration suppose we have the following matrices \[\mathbf{A}= \begin{bmatrix} 1& 2\\ 3& 4 \\ \end{bmatrix} \mathbf{B}= \begin{bmatrix} 5&6&7 \\ 8&9 &10 \end{bmatrix}\]
We can multiply \(\mathbf{AB}\) because \(\mathbf{A}\) has 2 columns, and \(\mathbf{B}\) has 2 rows
The product \(\mathbf{C}\) = \(\mathbf{AB}\) is
\[\mathbf{C}= \begin{bmatrix} 1& 2\\ 3& 4 \\ \end{bmatrix} \times \begin{bmatrix} 5&6&7 \\ 8&9 &10 \end{bmatrix} = \begin{bmatrix} 1 \times 5 + 2\times 8&1 \times 6 + 2 \times 9 & 1 \times 7 + 2 \times 10 \\ 3 \times 5 + 4\times 8&3 \times 6 + 4 \times 9 & 3 \times 7 + 4 \times 10 \end{bmatrix}\]
\[= \begin{bmatrix} 21& 24& 27 \\ 47&54& 61 \end{bmatrix}\]
A scalar is a single real number
You can also multiply a scalar by a matrix
If \(\gamma\) is a scalar, and \(\mathbf{A}\) is a matrix, then
\[\mathbf{\gamma A}= \gamma \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1n} \\ a_{21}& a_{22} &\cdots & a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1}& a_{m2} &\cdots & a_{mn} \end{bmatrix} = \begin{bmatrix} \gamma a_{11}&\gamma a_{12} &\cdots & \gamma a_{1n} \\ \gamma a_{21}& \gamma a_{22} &\cdots & \gamma a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ \gamma a_{m1}& \gamma a_{m2} &\cdots & \gamma a_{mn} \end{bmatrix}\]
The transpose of a matrix is one where the rows and columns are switched
Suppose matrix \(\mathbf{A}\) is
\[\mathbf{A}= \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1n} \\ a_{21}& a_{22} &\cdots & a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1}& a_{m2} &\cdots & a_{mn} \end{bmatrix}\]
\[\mathbf{A'}= \begin{bmatrix} a_{11}& a_{21} &\cdots & a_{m1} \\ a_{12}& a_{22} &\cdots & a_{m2} \\ \vdots & \vdots &\ddots & \vdots \\ a_{1n}& a_{2n} &\cdots & a_{mn} \end{bmatrix}\]
\[\mathbf{(A')' = A }\] \[\mathbf{(\alpha A)' = \alpha A' }\] \[\mathbf{(A + B)' = A' + B' }\] \[\mathbf{(AB)' = B'A' }\]
You may sometimes want to break matrices into vectors before you multiply
Multiplication works the same way, but notation can be cleaner and more intuitive
Suppose we have the following matrices \[\mathbf{A}= \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1n} \\ a_{21}& a_{22} &\cdots & a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{m1}& a_{m2} &\cdots & a_{mn} \end{bmatrix} \mathbf{B}= \begin{bmatrix} b_{11}& b_{12} &\cdots & b_{1p} \\ b_{21}& b_{22} &\cdots & b_{2p} \\ \vdots & \vdots &\ddots & \vdots \\ b_{n1}& b_{n2} &\cdots & b_{np} \end{bmatrix}\]
We are interested in the product \(\mathbf{AB}\)
\[\mathbf{A}= \begin{bmatrix} \mathbf{a_{1}}&\mathbf{a_{2}} & \cdots & \mathbf{a_{n}} \end{bmatrix} \mathbf{B}= \begin{bmatrix} \mathbf{b_{1}}\\ \mathbf{b_{2} }\\ \vdots \\ \mathbf{b_{n}} \end{bmatrix}\]
\[\mathbf{a_{1}}= \begin{bmatrix} a_{11}\\ a_{21}\\ \cdots\\ a_{m1} \end{bmatrix} \mathbf{b_{1}}= \begin{bmatrix} b_{11}&b_{12} & \cdots & b_{1p} \end{bmatrix}\]
\[\mathbf{AB} = \sum_{i=1}^{n} \mathbf{a_{i}b_{i}}\]
This breaks the product \(\mathbf{AB}\) into the sum of \(n\) sub-matrices
Each sub-matrix is product of corresponding vectors
Also each sub-matrix will have dimension \(m \times p\)
This will be useful for some econometric estimators we derive
Again, note that you get the same answer as doing straight matrix multiplication
\[(\alpha + \beta)\mathbf{A} = \alpha \mathbf{A} + \beta\mathbf{A}\] \[\alpha (\mathbf{A} +\mathbf{B}) =\alpha \mathbf{A} +\alpha\mathbf{B}\] \[(\alpha\beta) \mathbf{A} =\alpha(\beta \mathbf{A})\] \[\alpha (\mathbf{A}\mathbf{B}) =(\alpha \mathbf{A}) \mathbf{B}\] \[(\mathbf{A}\mathbf{B} )\mathbf{C} =\mathbf{A}(\mathbf{B} \mathbf{C})\] \[\mathbf{A}(\mathbf{B} +\mathbf{C}) =\mathbf{A}\mathbf{B} +\mathbf{A} \mathbf{C}\] \[(\mathbf{A}+\mathbf{B} )\mathbf{C} =\mathbf{A}\mathbf{C} +\mathbf{B} \mathbf{C}\] \[\mathbf{A}\mathbf{I} =\mathbf{I}\mathbf{A} = \mathbf{A}\] \[\mathbf{A}\mathbf{0} =\mathbf{0}\mathbf{A} = \mathbf{0}\] \[\mathbf{A}\mathbf{B} \neq\mathbf{B}\mathbf{A}\] —
The trace of a square matrix is the sum of the diagonal elements
If square matrix \(\mathbf{A}\) is
\[\mathbf{A}= \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1n} \\ a_{21}& a_{22} &\cdots & a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{n1}& a_{n2} &\cdots & a_{nn} \end{bmatrix}\]
\[tr(\mathbf{A})= \sum_{i=1}^{n} a_{ii}\]
\[tr(\mathbf{I_{n}})= n\] \[tr(\mathbf{A}')=tr(\mathbf{A})\] \[tr(\mathbf{A +B})=tr(\mathbf{A}) + tr(\mathbf{B})\] \[tr(\alpha \mathbf{A})=\alpha tr(\mathbf{A})\] \[tr(\mathbf{AB})=tr(\mathbf{BA})\]
The determinant is a scalar value associated with a square matrix
Helpful concept for several things in matrix algebra
For econometrics, most useful for solving systems of equations and finding inverse of a matrix
For \(2 \times 2\) matrix \(\mathbf{A}\) \[\mathbf{A}= \begin{bmatrix} a_{11}& a_{12} \\ a_{21}& a_{22} \\ \end{bmatrix}\]
The determinant is
\[|\mathbf{A}|=a_{11}a_{22} - a_{12}a_{21}\]
\[\mathbf{A}= \begin{bmatrix} a_{11}& a_{12} & a_{13} \\ a_{21}& a_{22} & a_{23} \\ a_{31}& a_{32} & a_{33} \\ \end{bmatrix}\]
\[|\mathbf{A}|=a_{11}a_{22}a_{33} + a_{12}a_{23}a_{31} +a_{13}a_{21}a_{32}\] \[-(a_{12}a_{21}a_{33} + a_{11}a_{23}a_{32} +a_{13}a_{22}a_{31})\]
\[=a_{11}(a_{22}a_{33} - a_{23}a_{32}) + a_{12}(a_{23}a_{31} -a_{21}a_{33} ) +a_{13}(a_{21}a_{32} - a_{22}a_{31} )\]
\[|\mathbf{A}|=a_{i1}c_{i1} + a_{i2}c_{i2} + \cdots + a_{in}c_{in} \text{ for choice of any row i}\]
Where
\(a_{ij}\) is the \(ij\) element of matrix \(\mathbf{A}\)
\(c_{ij}\) is the \(ij\) cofactor of matrix \(\mathbf{A}\) defined as \[c_{ij} = (-1)^{i+j}|\mathbf{A}_{ij}|\]
\(|\mathbf{A}_{ij}|\) is the minor of matrix \(\mathbf{A}\)
Process is long and tedious for large matrices
\[\mathbf{A}= \begin{bmatrix} 1& 2 & 3 \\ 4& 5&6 \\ 7& 8 &9 \end{bmatrix}\]
Choose any row to find cofactors and compute determinant
Let us expand along row 1
\[|\mathbf{A}|=1(-1)^{1+1} \begin{vmatrix} 5&6 \\ 8 &9 \end{vmatrix} +2(-1)^{1+2} \begin{vmatrix} 4&6 \\ 7 &9 \end{vmatrix} +3(-1)^{1+3} \begin{vmatrix} 4&5 \\ 7 &8 \end{vmatrix}\]
\[|\mathbf{A}|= -3 +12 -9 = 0\]
\[\mathbf{A}\mathbf{A}^{-1} = \mathbf{A}^{-1}\mathbf{A} = \mathbf{I}\]
It is roughly the equivalent of taking the reciprocal in scalar math
The formula for the inverse is
\[\mathbf{A}^{-1}= \frac{1}{|\mathbf{A}|} \begin{bmatrix} c_{11}& c_{12} &\cdots & c_{1n} \\ c_{21}& c_{22} &\cdots & c_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ c_{n1}& c_{n2} &\cdots & c_{nn} \end{bmatrix}\]
The inverse exists only when \(|\mathbf{A}| \neq 0\)
This is why it is important to know the determinant
In example above, inverse does not exist
A matrix that cannot be inverted is singular
A matrix that has an inverse is nonsingular
Inverse matrices have the following properties
\[\mathbf{(\alpha A)^{-1} = \frac{1}{\alpha} A^{-1} }\] \[\mathbf{(A')^{-1}} = \mathbf{(A^{-1})' }\] \[\mathbf{(A^{-1})^{-1}} = \mathbf{A}\] \[\mathbf{(AB)^{-1}= B^{-1}A^{-1} }\]
Now that we can manipulate matrices, we can move to more advanced topics
Matrix algebra is useful for expressing and solving systems of equations
We will learn you can solve for the OLS estimator when regressors are linearly independent
To check linear independence, we use the concept of rank
The rank of a matrix is the maximum number of independent rows or columns
A set of vectors are linearly independent if you cannot express any of them as linear functions the others
Mathematically, suppose that \(\mathbf{A}=\begin{bmatrix} \mathbf{a}_{1}& \mathbf{a}_{2} &\cdots & \mathbf{a}_{m} \end{bmatrix}\)
The vectors are independent if the only solution to
\[\alpha_{1}\mathbf{a}_{1}+ \alpha_{2}\mathbf{a}_{2}+ \cdots+\alpha_{n}\mathbf{a}_{n}= 0\]
\[\alpha_{1} = \alpha_{2}= \cdots=\alpha_{n}= 0\]
The rank of a matrix is the maximum number of linearly independent rows or columns
The rank of the rows will always equal the rank of the columns
If the number of rows is less than columns, the highest rank is the number of rows
Vice versa if the number of columns is less than the number of rows
A matrix has full rank if rank equals the minimum of the number of rows/columns
In econometrics, we mostly deal with matrices with more rows than columns
We will see later we need our matrix of regressors to have full rank
Some useful properties of the rank of a matrix
The rank of a matrix and transpose are the same \[rank(\mathbf{A'}) = rank(\mathbf{A})\]
If \(\mathbf{A}\) is \(m \times n\) then \[rank(\mathbf{A}) \le min(m,n)\]
If \(\mathbf{A}\) is \(n \times n\) and \(rank(\mathbf{A}) =n\) then \(\mathbf{A}\) is nonsingular (invertible)
\[\mathbf{x'Ax}= \begin{bmatrix} x_{1}& x_{2} &\cdots & x_{n} \end{bmatrix} \begin{bmatrix} a_{11}& a_{12} &\cdots & a_{1n} \\ a_{21}& a_{22} &\cdots & a_{2n} \\ \vdots & \vdots &\ddots & \vdots \\ a_{n1}& a_{n2} &\cdots & a_{nn} \end{bmatrix} \begin{bmatrix} x_{1}\\ x_{2} \\ \vdots \\ x_{n} \end{bmatrix}\]
\[=\sum_{i=1}^n a_{ii}x_{i}^2 + 2\sum_{i=1}^n \sum_{j>i}a_{ij}x_{i}x_{j}\]
\[\mathbf{x'Ax} > 0\]
\[\mathbf{x'Ax} \ge 0\]
Positive definite matrices have diagonal elements that are strictly positive
Positive semidefinite matrices have diagonal elements that are nonnegative
Some other useful properties of positive definite/semidefinite matrices
If \(\mathbf{A}\) is positive definite, then \(\mathbf{A}^{-1}\) exists and is also positive definite
If \(\mathbf{A}\) is \(n \times m\), then \(\mathbf{A'A}\) and \(\mathbf{AA'}\) are positive definite
If \(\mathbf{A}\) is \(n \times m\) and \(rank(\mathbf{A}) = m\) then \(\mathbf{A'A}\) is positive definite
These concepts are used mostly for variance-covariance matrices in econometrics
An idempotent matrix is one that does not change when multiplied by itself
Mathematically, \(\mathbf{A}\) is idempotent when
\[\mathbf{AA} = \mathbf{A}\]
When we discuss OLS, we will work with the following idempotent matrices
\[\mathbf{P} = \mathbf{X(X'X)^{-1}X'}\] \[\mathbf{M} =\mathbf{I_{n}} - \mathbf{X(X'X)^{-1}X'}\]
You can verify they are idempotent my multiplying each by itself
Some important properties of idempotent matrices are
\(rank(\mathbf{A}) = tr(\mathbf{A})\)
\(\mathbf{A}\) is positive semidefinite
The expected value of a random matrix is the matrix of expected values
If \(\mathbf{X}\) is an \(n \times m\) matrix, then
\[\mathbf{E}(\mathbf{X})= \begin{bmatrix} \mathbf{E}(x_{11}) & \mathbf{E}(x_{12}) & \cdots & \mathbf{E}(x_{1m})\\ \mathbf{E}(x_{21}) & \mathbf{E}(x_{22}) & \cdots &\mathbf{E}(x_{2m})\\ \vdots & \vdots &\ddots & \vdots \\ \mathbf{E}(x_{n1}) & \mathbf{E}(x_{n2}) & \cdots &\mathbf{E}(x_{nm})\\ \end{bmatrix}\]
Properties of expected values are similar to those in scalar math
If \(\mathbf{x}\) is a random vector, \(\mathbf{b}\) is a nonrandom vector, and \(\mathbf{A}\) is a nonrandom matrix, then \(\mathbf{E}(\mathbf{Ax+b}) = \mathbf{A}\mathbf{E}(\mathbf{x})+\mathbf{b}\)
If \(\mathbf{X}\) is a random matrix, and \(\mathbf{B}\) and \(\mathbf{A}\) are nonrandom matrices, then \(\mathbf{E}(\mathbf{AXB}) = \mathbf{A}\mathbf{E}(\mathbf{X})\mathbf{B}\)
The variance-covariance matrix of random vector \(\mathbf{y}\) has variances on the diagonal, covariances in the off-diagonal
If \(\mathbf{y}\) is an \(n \times 1\) random vector, then
\[var(\mathbf{y})= \mathbf{\sigma_{y}} = \mathbf{E[(y-E[y])(y-E[y])']}\] \[= \begin{bmatrix} \text{var}(y_{1}) & \text{cov}(y_{1},y_{2}) & \cdots &\text{cov}(y_{1},y_{n}) \\ \text{cov}(y_{2},y_{1}) & \text{var}(y_{2}) & \cdots &\text{cov}(y_{2},y_{n}) \\ \vdots & \vdots &\ddots & \vdots \\ \text{cov}(y_{n},y_{1}) & \text{cov}(y_{n},y_{2}) & \cdots &\text{var}(y_{n})\\ \end{bmatrix}\]
Useful properties of variance-covariance matrices are
If \(\mathbf{a}\) is a nonrandom vector, then \(\text{var}(\mathbf{a'y}) =\mathbf{a'}\text{var}\mathbf{(y)a}\)
If \(\text{var}(\mathbf{a'y})>0\) for all \(\mathbf{a>0}\), \(\text{var}(\mathbf{y})\) is positive definite
If \(\mathbf{A}\) is a nonrandom matrix, \(\mathbf{b}\) is a nonrandom vector, then \(\text{var}(\mathbf{Ay + b}) =\mathbf{A'}\text{var}\mathbf{(y)A}\)
If \(\text{var}(y_{j})=\sigma^{2}\) for all \(j=1,2,...,n\), and the elements of \(\textbf{y}\) are uncorrelated, then \(\text{var}(\mathbf{y})=\sigma^{2}\mathbf{I_{n}}\)
A scalar function of a vector is a single function with respect to several variables
Consider the scalar function \(y = f(\mathbf{x}) =f(x_{1}, x_{2},...,x_{n})\)
The function takes the vector \(\mathbf{x}\) and returns a scalar
This is just another way to write a multivariate function
The derivative of this function is
\[\frac{\partial f(\mathbf{x})}{\mathbf{x}}= \begin{bmatrix} \frac{\partial f(\mathbf{x})}{x_{1}} & \frac{\partial f(\mathbf{x})}{x_{2}} & \cdots & \frac{\partial f(\mathbf{x})}{x_{n}} \end{bmatrix}\]
We simply collect the derivative with respect to each element of \(\mathbf{x}\) in a vector
Ex: linear function of \(\mathbf{x}\)
Suppose \(\mathbf{a}\) is an \(n \times 1\) vector and \[y = f(\mathbf{x}) = \mathbf{a'x} = \sum_{i=1}^{n} a_{i}x_{i}\]
The derivative is
\[\frac{\partial f(\mathbf{x})}{\partial \mathbf{x}}=\frac{\partial \mathbf{a'x} }{\partial \mathbf{x}}= \mathbf{a'} = \begin{bmatrix} a_{1}& a_{2}& \cdots & a_{n} \end{bmatrix}\]
Ex: Quadratic form of \(\mathbf{x}\)
Suppose \(\mathbf{A}\) is an \(n \times n\) symmetric matrix. The quadratic form is \[y = f(\mathbf{x}) = \mathbf{x'Ax} =\sum_{i=1}^n a_{ii}x_{i}^2 + 2\sum_{i=1}^n \sum_{j>i}a_{ij}x_{i}x_{j}\]
The derivative is \[\frac{\partial f(\mathbf{x})}{\partial \mathbf{x}}=\frac{\partial \mathbf{x'Ax} }{\partial \mathbf{x}}= \mathbf{2x'A}\]
\[y= \beta_{0} + \beta_{1}x_{1} + \beta_{2}x_{2} + \cdots + \beta_{k}x_{k} + u\]
\(y\) and \(x_{1},...,x_{k}\) are observable random variables
\(u\) is an unobservable random variable
We can write more compactly in vector form as
\[y= \mathbf{x}\boldsymbol{\beta} + u\]
\(\mathbf{x}\) is a \(1 \times (k+1)\) vector of independent variables
\(\boldsymbol{\beta}\) is a \((k+1) \times 1\) vector of slope parameters
Now suppose we take a random sample of \(n\) people from the population
The population model holds for each member of the sample
\[y_{i}= \mathbf{x_{i}}\boldsymbol{\beta} + u_{i}, \forall i=1,...,n\]
\[\mathbf{y}= \mathbf{X}\boldsymbol{\beta} + \mathbf{u}\]
\(\mathbf{X}\) is an \(n \times (k+1)\) matrix of observations on each regressor
\(\boldsymbol{\beta}\) is still a \((k+1) \times 1\) vector of slope parameters
\(\mathbf{y}\) is an \(n \times 1\) vector of observations on the dependent variable
\(\mathbf{u}\) is an \(n \times 1\) vector of error terms