9 Normality Assumption
9.1 Introduction
To the mean and covariance assumptions, we can add the normality assumption. This is a very strong and powerful assumption, that will enable us to obtain the distribution of our data and several of our estimates.
We will assume:
\[\mathbf{e}\sim N(\mathbf{0}, \sigma^2 \mathbf{I})\]
Also, note that the Normal distribution is completely characterized by its mean and variance, so if the distribution of our estimates is normal, we will already have the information to completely characterize their distribution since we have computed the mean and variance of the estimates in the previous chapter.
Another thing, that assuming normality allows is to obtain the likelihood of our data, and in this way obtain Maximum Likelihood Estimates (MLE) of the parameters we have introduced \(\boldsymbol{\beta}\) and \(\sigma^2\).
9.2 Maximum Likelihood Estimation
In order to perform the maximum likelihood estimates of \(\boldsymbol{\beta}\) and \(\sigma^2\), we need the distribution of \(\mathbf{y}\). This is very easy to obtain, since:
\[\mathbf{y}= \mathbf{X}\boldsymbol{\beta}+ \mathbf{e}\] is a linear combination of \(\mathbf{e}\) (in fact is just a translation of \(\mathbf{e}\)), and therefore it is normally distributed. Since we have already computed its mean and variance in the previous chapter, using those computations, we can conclude that:
\[\mathbf{y}\sim N(\mathbf{X}\boldsymbol{\beta}, \sigma^2 \mathbf{I})\] This means that the likelihood of \(\boldsymbol{\beta}\) and \(\sigma^2\) is given by:
\[ \mathcal{L}(\boldsymbol{\beta}, \sigma^2 | \mathbf{y}) = N(\mathbf{y}| \mathbf{X}\mathbf{y}, \sigma^2 \mathbf{I}) \] Then this means that:
\[\begin{align*} \mathcal{L}(\boldsymbol{\beta}, \sigma^2 | \mathbf{y}) &= (2 \pi)^{-\frac{n}{2}} |\sigma^2 \mathbf{I}|^{-\frac{1}{2}} \exp\left\{ -\frac{1}{2}(\mathbf{y}- \mathbf{X}\boldsymbol{\beta})'(\sigma^2 \mathbf{I})^{-1}(\mathbf{y}- \mathbf{X}\boldsymbol{\beta}) \right\} \\ &= (2 \pi)^{-\frac{n}{2}} (\sigma^2)^{-\frac{n}{2}} |\mathbf{I}| \exp\left\{ -\frac{1}{2}(\mathbf{y}- \mathbf{X}\boldsymbol{\beta})'\frac{\mathbf{I}}{\sigma^2}(\mathbf{y}- \mathbf{X}\boldsymbol{\beta}) \right\} \\ &= (2 \pi \sigma^2)^{-\frac{n}{2}} \exp\left\{ -\frac{1}{2 \sigma^2}(\mathbf{y}- \mathbf{X}\boldsymbol{\beta})'(\mathbf{y}- \mathbf{X}\boldsymbol{\beta}) \right\} \\ &= (2 \pi \sigma^2)^{-\frac{n}{2}} \exp\left\{ -\frac{1}{2 \sigma^2}(\mathbf{y}- \mathbf{X}\boldsymbol{\beta})'(\mathbf{y}- \mathbf{X}\boldsymbol{\beta}) \right\} \\ &= (2 \pi \sigma^2)^{-\frac{n}{2}} \exp\left\{ -\frac{1}{2 \sigma^2}(\mathbf{y}- \mathbf{X}\boldsymbol{\beta})'(\mathbf{y}- \mathbf{X}\boldsymbol{\beta}) \right\} \\ \end{align*}\]
Now recalling from:
\[(\mathbf{y}- \mathbb{E}[\mathbf{y}])'(\mathbf{y}- \mathbb{E}[\mathbf{y}]) = \hat{\mathbf{e}}'\hat{\mathbf{e}} + (\hat{\mathbf{y}} - \mathbb{E}[\hat{\mathbf{y}}])'(\hat{\mathbf{y}} - \mathbb{E}[\hat{\mathbf{y}}])\] that is:
\[(\mathbf{y}- \mathbf{X}\boldsymbol{\beta})'(\mathbf{y}- \mathbf{X}\boldsymbol{\beta}) = \hat{\mathbf{e}}'\hat{\mathbf{e}} + (\mathbf{X}\hat{\boldsymbol{\beta}} - \mathbf{X}\boldsymbol{\beta})'(\mathbf{X}\hat{\boldsymbol{\beta}} - \mathbf{X}\boldsymbol{\beta}) = \hat{\mathbf{e}}'\hat{\mathbf{e}} + (\hat{\boldsymbol{\beta}} - \boldsymbol{\beta})'\mathbf{X}\mathbf{X}( \hat{\boldsymbol{\beta}} - \boldsymbol{\beta})\]
then, the likelihood can be written as:
\[\begin{align*} \mathcal{L}(\boldsymbol{\beta}, \sigma^2 | \mathbf{y}) &= (2 \pi \sigma^2)^{-\frac{n}{2}} \exp\left\{ -\frac{1}{2 \sigma^2}(\mathbf{y}- \mathbf{X}\boldsymbol{\beta})'(\mathbf{y}- \mathbf{X}\boldsymbol{\beta}) \right\} \\ &= (2 \pi \sigma^2)^{-\frac{n}{2}} \exp\left\{ -\frac{\hat{\mathbf{e}}'\hat{\mathbf{e}}}{2 \sigma^2} -\frac{(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta})'\mathbf{X}\mathbf{X}( \hat{\boldsymbol{\beta}} - \boldsymbol{\beta})}{2 \sigma^2} \right\} \\ \end{align*}\]
This is a useful way to write the likelihood, since it is easy to optimize with respect to \(\boldsymbol{\beta}\). Since, independently of the value of \(\sigma^2\), the value of \(\boldsymbol{\beta}\) that maximizes the likelihood is the OLS estimator \(\hat{\boldsymbol{\beta}}\), since it makes zero the following term:
\[ -\frac{(\hat{\boldsymbol{\beta}} - \boldsymbol{\beta})'\mathbf{X}\mathbf{X}( \hat{\boldsymbol{\beta}} - \boldsymbol{\beta})}{2 \sigma^2} \] In this way, we only need to maximize the likelihood with respect to \(\sigma^2\), as the following marginal likelihood:
\[ \mathcal{L}(\sigma^2 | \mathbf{y}, \boldsymbol{\beta}= \hat{\boldsymbol{\beta}} ) = (2 \pi \sigma^2)^{-\frac{n}{2}} \exp\left\{ -\frac{\hat{\mathbf{e}}'\hat{\mathbf{e}}}{2 \sigma^2} \right\}\] Instead of maximizing the marginal likelihood directly, we will maximize the marginal log-likelihood:
\[ \ell(\sigma^2 | \mathbf{y}, \boldsymbol{\beta}= \hat{\boldsymbol{\beta}}) = -\frac{n}{2} \log(2 \pi) -\frac{n}{2} \log(\sigma^2) -\frac{\hat{\mathbf{e}}'\hat{\mathbf{e}}}{2 \sigma^2}\] We can do this maximization, by taking the derivative:
\[\begin{align*} \frac{d \ell}{d \sigma^2} &= -\frac{n}{2 \sigma^2} + \frac{\hat{\mathbf{e}}'\hat{\mathbf{e}}}{2 (\sigma^2)^2} \end{align*}\]
Then
\[\begin{align*} \frac{d \ell}{d \sigma^2} =0 &\implies -\frac{n}{2 \sigma^2} + \frac{\hat{\mathbf{e}}'\hat{\mathbf{e}}}{2 (\sigma^2)^2} = 0 \\ &\implies -n + \frac{\hat{\mathbf{e}}'\hat{\mathbf{e}}}{\sigma^2} = 0 \\ &\implies \frac{\hat{\mathbf{e}}'\hat{\mathbf{e}}}{\sigma^2} = n \\ &\implies \sigma^2 = \frac{\hat{\mathbf{e}}'\hat{\mathbf{e}}}{n} \\ \end{align*}\]
So we have, that the Maximum Likelihood Estimate of \(\sigma^2\) is given by:
\[\tilde{\sigma}^2 = \frac{\hat{\mathbf{e}}'\hat{\mathbf{e}}}{n}\]
And, taking the second derivative to confirm it is a maximum we have that:
\[\begin{align*} \frac{d^2 \ell}{d (\sigma^2)^2} \bigg|_{\sigma^2 = \tilde{\sigma}^2} &= \frac{n}{2 (\tilde{\sigma}^2)^2} - \frac{\hat{\mathbf{e}}'\hat{\mathbf{e}}}{(\tilde{\sigma}^2)^3} \\ &= \frac{n}{2 (\tilde{\sigma}^2)^2} - \frac{n}{(\tilde{\sigma}^2)^2} \\ &= - \frac{n}{(2 \tilde{\sigma}^2)^2} \\ &\leq 0 \end{align*}\]
So \(\tilde{\sigma}^2\) is indeed maximizing the marginal likelihood.
Note that, unlike with the \(\boldsymbol{\beta}\) parameter, our OLS estimate \(\hat{\sigma}^2\) of \(\sigma^2\) is different to the MLE estimator \(\tilde{\sigma}^2\). In particular \(\tilde{\sigma}^2\) is biased.
This doesn’t mean that one estimate is better than the other, they just have different properties.
We will work more with \(\hat{\sigma}^2\) than \(\tilde{\sigma}^2\), since it is more useful to build certain statistics.
9.3 Distribution of Estimates
In the same way that in the last chapters we computed the mean and the variance of our estimates, we can obtain the distribution of the estimates:
9.3.1 Distribution of \(\hat{\boldsymbol{\beta}}\), \(\hat{\mathbf{y}}\) and \(\hat{\mathbf{e}}\)
Most of our estimates are linear combinations of \(\mathbf{y}\), so they are normally distributed and we can use the mean and variances computed in the last chapter to fully characterize their distributions. This is the case for:
\[ \hat{\boldsymbol{\beta}}, \quad \hat{\mathbf{y}}, \quad \hat{\mathbf{e}} \] Their distributions are:
\[ \hat{\boldsymbol{\beta}} \sim N(\boldsymbol{\beta}, \sigma^2 (\mathbf{X}'\mathbf{X})^{-1}) \] \[ \hat{\mathbf{y}} \sim N(\mathbf{X}\boldsymbol{\beta}, \sigma^2 \mathbf{H}) \] \[ \hat{\mathbf{e}} \sim N(\mathbf{0}, \sigma^2 (\mathbf{I}-\mathbf{H})) \]
This is not the case for the estimate \(\hat{\sigma}^2\), since it is not a linear transformation of \(\mathbf{y}\).
9.3.2 Distribution of \(\hat{\sigma}^2\)
Obtaining the distribution of \(\hat{\sigma}^2\) is not as straight forward. We will use 3 steps:
- Express \(\frac{\hat{\mathbf{e}}'\hat{\mathbf{e}}}{\sigma^2}\) as quadratic form of standard normal with an idempotent matrix.
- Show that the quadratic form of standard normal with an idempotent matrix is distributed as a chi squared.
- Relate the distribution of \(\frac{\hat{\mathbf{e}}'\hat{\mathbf{e}}}{\sigma^2}\) to the distribution of \(\hat{\sigma}^2\).
9.3.2.1 Distribution of \(\hat{\sigma}^2\) Step 1
First, note that:
\[\begin{align*} (\mathbf{I}- \mathbf{H}) \mathbf{y} &= (\mathbf{I}- \mathbf{H}) (\mathbf{X}\boldsymbol{\beta}+ \mathbf{e}) && \text{since $\mathbf{y}= \mathbf{X}\boldsymbol{\beta}+ \mathbf{e}$} \\ &= \mathbf{I}\mathbf{X}\boldsymbol{\beta}- \mathbf{H}\mathbf{X}\boldsymbol{\beta}+ \mathbf{I}\mathbf{e}- \mathbf{H}\mathbf{e}\\ &= \mathbf{X}\boldsymbol{\beta}- \mathbf{X}\boldsymbol{\beta}+ \mathbf{e}- \mathbf{H}\mathbf{e}&& \text{since $\mathbf{H}\mathbf{X}= \mathbf{X}$} \\ &= \mathbf{e}- \mathbf{H}\mathbf{e}\\ &= (\mathbf{I}- \mathbf{H}) \mathbf{e}\\ \end{align*}\]
Then:
\[\begin{align*} \mathbf{y}' (\mathbf{I}- \mathbf{H}) \mathbf{y} &= \mathbf{y}' (\mathbf{I}- \mathbf{H}) (\mathbf{I}- \mathbf{H}) \mathbf{y}&& \text{since $(\mathbf{I}- \mathbf{H})$ is idempotent} \\ &= \mathbf{e}' (\mathbf{I}- \mathbf{H}) (\mathbf{I}- \mathbf{H}) \mathbf{e}&& \text{since $(\mathbf{I}- \mathbf{H}) \mathbf{y}= (\mathbf{I}- \mathbf{H}) \mathbf{e}$} \\ &= \mathbf{e}' (\mathbf{I}- \mathbf{H}) \mathbf{e}&& \text{since $(\mathbf{I}- \mathbf{H})$ is idempotent} \\ \end{align*}\]
Then,
\[\begin{align*} \frac{\hat{\mathbf{e}}'\hat{\mathbf{e}}}{\sigma^2} &= \frac{\mathbf{y}' (\mathbf{I}- \mathbf{H}) \mathbf{y}}{\sigma^2} && \text{since $\hat{\mathbf{e}}'\hat{\mathbf{e}} = \mathbf{y}' (\mathbf{I}- \mathbf{H}) \mathbf{y}$} \\ &= \frac{\mathbf{e}' (\mathbf{I}- \mathbf{H}) \mathbf{e}}{\sigma^2} && \text{since $\mathbf{e}' (\mathbf{I}- \mathbf{H}) \mathbf{e}= \mathbf{y}' (\mathbf{I}- \mathbf{H}) \mathbf{y}$} \\ &= \frac{\mathbf{e}}{\sqrt{\sigma^2}}' (\mathbf{I}- \mathbf{H}) \frac{\mathbf{e}}{\sqrt{\sigma^2}} \\ \end{align*}\]
Finally, note that \(\frac{\mathbf{e}}{\sqrt{\sigma^2}}\) is a linear function of \(\mathbf{e}\) that is normal, therefore it is normal also, with mean:
\[ \mathbb{E}\left[\frac{\mathbf{e}}{\sqrt{\sigma^2}}\right] = \frac{1}{\sqrt{\sigma^2}} \mathbb{E}[\mathbf{e}] = \frac{1}{\sqrt{\sigma^2}} \mathbf{0}= \mathbf{0}\] \[ \mathbb{V}\left[\frac{\mathbf{e}}{\sqrt{\sigma^2}}\right] = \left(\frac{1}{\sqrt{\sigma^2}}\right)^2 \mathbb{V}[\mathbf{e}] = \frac{1}{\sigma^2} \sigma^2 \mathbf{I}= \mathbf{I}\] So
\[ \frac{\mathbf{e}}{\sqrt{\sigma^2}} \sim N(\mathbf{0}, \mathbf{I}) \] is a multivariate standard normal. This concludes step 1.
9.3.2.2 Distribution of \(\hat{\sigma}^2\) Step 2
Let \(\mathbf{z}\in \mathbb{R}^{n}\) a multivariate standard normal and \(\mathbf{M}\in \mathbb{R}^{n \times n}\) and idempotent matrix of rank \(m\). Then, we will show that:
\[ \mathbf{z}' \mathbf{M}\mathbf{z}\sim \chi^2_{m} \]
To show this result, we will use the spectral decomposition of \(\mathbf{M}\), that is:
\[ \mathbf{M}= \mathbf{V}\boldsymbol{\Sigma}\mathbf{V}' \]
with \(\mathbf{V}\) orthonormal and \(\boldsymbol{\Sigma}\) diagonal. Since \(\mathbf{V}= [\mathbf{v}_1,\ldots,\mathbf{v}_n]\) is orthonormal, then we have that:
\[ \mathbf{v}_i'\mathbf{v}_j = 0 \quad \forall i \neq j \quad \text{and} \quad ||\mathbf{v}_i||^2_2 = 1 \quad \forall i\] and, since \(\mathbf{M}\) is idempotent, then \(\boldsymbol{\Sigma}\) is diagonal with exactly \(m\) entries equal to \(1\) and the rest equal to \(0\). Without loss of generality, we can assume that the first \(m\) entries of the diagonal are equal to \(1\) and the next entries equal to \(0\).
Then, first note that \(\mathbf{V}' \mathbf{z}\) is a linear combination of a normal distribution. We will show, that \(\mathbf{V}' \mathbf{z}\in \mathbb{R}^{n}\) is also standard normal.
Note that:
\[ \mathbb{E}[\mathbf{V}' \mathbf{z}] = \mathbf{V}' \mathbb{E}[\mathbf{z}] = \mathbf{V}' \mathbf{0}= \mathbf{0}\] \[ \mathbb{V}[\mathbf{V}' \mathbf{z}] = \mathbf{V}' \mathbb{V}[\mathbf{z}] \mathbf{V}= \mathbf{V}' \mathbf{I}\mathbf{V}= \mathbf{V}' \mathbf{V}= \mathbf{I}\]
Then \(\mathbf{V}' \mathbf{z}\) is also standard normal. Let’s name \(\mathbf{w}= \mathbf{V}' \mathbf{z}\), then each of the components \(w_1,\ldots,w_n\) of \(\mathbf{w}\) are independent univariate standard normally distributed.
Then
\[\begin{align*} \mathbf{z}' \mathbf{M}\mathbf{z} &= \mathbf{z}' (\mathbf{V}\boldsymbol{\Sigma}\mathbf{V}') \mathbf{z}&& \text{using the spectral decomposition of $\mathbf{M}$} \\ &= (\mathbf{V}' \mathbf{z})' \boldsymbol{\Sigma}(\mathbf{V}' \mathbf{z}) \\ &= \mathbf{w}' \boldsymbol{\Sigma}\mathbf{w}&& \text{since $\mathbf{w}= \mathbf{V}' \mathbf{z}$} \\ &= \sum_{i=1}^n [\boldsymbol{\Sigma}]_{ii} w_i^2 \\ &= \sum_{i=1}^{m} w_i^2 && \text{since only the first $m$ entries are equal to $1$} \\ &\sim \chi^2_m && \text{by definition of the $\chi^2$ distribution} \\ \end{align*}\]
9.3.2.3 Distribution of \(\hat{\sigma}^2\) Step 3
Using step 1 and 2, we can conclude that:
\[ \frac{\hat{\mathbf{e}}'\hat{\mathbf{e}}}{\sigma^2} \sim \chi^2_{n-p} \] since the rank of the idempotent matrix \((\mathbf{I}- \mathbf{H})\) is \(n-p\). Since:
\[ \hat{\sigma}^2 = \frac{\hat{\mathbf{e}}'\hat{\mathbf{e}}}{n-p} = \frac{\sigma^2}{n-p}\frac{\hat{\mathbf{e}}'\hat{\mathbf{e}}}{\sigma^2} \sim \frac{\sigma^2}{n-p}\chi^2_{n-p} \]
9.3.3 Independence of \(\hat{\mathbf{e}}\) and \(\hat{\mathbf{y}}\)
We have seen that \(\hat{\mathbf{e}}\) and \(\hat{\mathbf{y}}\) are uncorrelated, that is
\[ \mathbb{C}[\hat{\mathbf{e}}, \hat{\mathbf{y}}] = \mathbf{0}\] however, this doesn’t necessarily mean they are independent. Independence is a much more stronger property. For example, if two random variables are independent then any function of this random variables will be independent. However, if two random variables are uncorrelated then not necessarily any function of this random variables will be uncorrelated.
Now, after our assumption of normality, then \(\hat{\mathbf{e}}\) and \(\hat{\mathbf{y}}\) are independent, because uncorrelated random variables implies independence for normally distributed random variables, as is the case of \(\hat{\mathbf{e}}\) and \(\hat{\mathbf{y}}\).
Then, any variable that is a function of \(\hat{\mathbf{e}}\) will be independent of any random variable that is a function of \(\hat{\mathbf{y}}\), even if these new random variables are not normal themselves.
9.4 Interval Estimation
So far, we have obtained point estimates of different quantities of interest, \(\boldsymbol{\beta}\), \(\mathbf{e}\) and \(\hat{\sigma}^2\), however the fact we have the distribution of estimates of this quantities will allow us to obtain interval estimators.
9.4.1 Confidence Intervals for Coefficients
We know that the OLS estimate for the coefficients, has the following distribution:
\[ \hat{\boldsymbol{\beta}} \sim N(\mathbf{0}, \sigma^2 (\mathbf{X}'\mathbf{X})^{-1}) \]
then, this means that each of the entries of \(\hat{\boldsymbol{\beta}} = (\hat{\beta}_0, \hat{\beta}_1,\ldots,\hat{\beta}_{p-1})\) is normally distributed, as follows:
\[ \hat{\beta}_i \sim N(\beta_i, \sigma^2 [(\mathbf{X}'\mathbf{X})^{-1}]_{ii}) \] we will call
\[ \sigma^2_{\beta_i} = \sigma^2 [(\mathbf{X}'\mathbf{X})^{-1}]_{ii} \] then, we can re-write the distribution as follows:
\[ \hat{\beta}_i \sim N \left(\beta_i, \sigma^2_{\beta_i} \right) \] unfortunately this distribution depends on \(\sigma^2\) which is unknown, so we can’t use it to build a confidence interval. So we transform the statistic first by removing the mean \(\beta_i\) and by diving by the standard deviation:
\[ t^0_{\beta_i}=\frac{\hat{\beta}_i - \beta_i}{\sqrt{\sigma^2_{\beta_i}}} \] Then, this new quantity is normally distributed, since it a linear transformation of a random variable that is normally distributed, and has mean as variance as follows:
\[ \mathbb{E}[t^0_{\beta_i}] = \mathbb{E}\left[\frac{\hat{\beta}_i - \beta_i}{\sqrt{\sigma^2_{\beta_i}}}\right]= \frac{\hat{\mathbb{E}[\beta}_i] - \beta_i}{\sqrt{\sigma^2_{\beta_i}}} = \frac{\beta_i - \beta_i}{\sqrt{\sigma^2_{\beta_i}}} = 0 \] \[ \mathbb{V}[t^0_{\beta_i}] = \mathbb{V}\left[\frac{\hat{\beta}_i - \beta_i}{\sqrt{\sigma^2_{\beta_i}}}\right]= \left(\frac{1}{\sqrt{\sigma^2_{\beta_i}}}\right)^2 \mathbb{V}[\hat{\beta}_i - \beta_i] = \frac{1}{\sigma^2_{\beta_i}} \mathbb{V}[\hat{\beta}_i] = \frac{1}{\sigma^2_{\beta_i}} \sigma^2_{\beta_i} = 1 \] then:
\[ t^0_{\beta_i} \sim N(0, 1) \] that is \(t^0_{\beta_i}\) is distributed like a standard normal. Now the distribution doesn’t depend on any unknown parameter (on top of \(\beta_i\), that is the parameter of interest), but the quantity itself depends on \(\sigma^2\) through \(\sigma^2_{\beta_i}\), so we can’t use it to build a confidence interval.
We consider a new quantity
\[ t_{\beta_i} = \frac{\hat{\beta}_i - \beta_i}{\sqrt{\hat{\sigma}^2_{\beta_i}}} \] where \(\hat{\sigma}^2_{\beta_i} = \hat{\sigma}^2 [(\mathbf{X}'\mathbf{X})^{-1}]_{ii}\), so \(t_{\beta_i}\) doesn’t depend on \(\sigma^2\). Let’s compute the distribution of this quantity, first lets re-write the statistic as follows:
\[ t_{\beta_i} = \frac{\hat{\beta}_i - \beta_i}{\sqrt{\hat{\sigma}^2_{\beta_i}}} = \frac{\sqrt{\frac{1}{\sigma^2}}}{\sqrt{\frac{1}{\sigma^2}}}\frac{\hat{\beta}_i - \beta_i}{\sqrt{\hat{\sigma}^2[(\mathbf{X}'\mathbf{X})^{-1}]_{ii}}} = \frac{\frac{\left(\hat{\beta}_i - \beta_i\right)}{\sqrt{\sigma^2[(\mathbf{X}'\mathbf{X})^{-1}]_{ii}}}}{ \sqrt{\frac{\hat{\sigma}^2}{\sigma^2}}} = \frac{\frac{\left(\hat{\beta}_i - \beta_i\right)}{\sqrt{\sigma^2_{\beta_i}}}}{ \sqrt{\frac{(n-p)\frac{\hat{\sigma}^2}{\sigma^2}}{n-p}}} = \frac{t^0_{\beta_i}}{ \sqrt{\frac{\frac{\hat{\mathbf{e}}'\hat{\mathbf{e}}}{\sigma^2}}{n-p}}}\]
Now, we know \(t^0_{\beta_i}\) is standard normal distributed, and from the distribution of \(\hat{\sigma}^2\) we have that:
\[ \frac{\hat{\mathbf{e}}'\hat{\mathbf{e}}}{\sigma^2} \sim \chi^2_{n-p} \] and from the independence of \(\hat{\boldsymbol{\beta}}\) and \(\hat{\mathbf{e}}\) we have that any function of both variables is independent, in particular
\[ t^0_{\beta_i} \quad \text{and} \quad \frac{\hat{\mathbf{e}}'\hat{\mathbf{e}}}{\sigma^2} \] are independent. Therefore
\[ t_{\beta_i} \sim t_{n-p} \] Now, let \(t \sim t_m\) a random variable with a \(t\) distribution with \(m\) degrees of freedom. Then call:
\[ t_m\left(a\right) \quad \text{such that} \quad \mathbb{P}\left(t\leq t_m\left(a\right) \right) = a\] for any \(a\in[0,1]\)
Then, we have that:
\[\begin{align*} \mathbb{P} &\left( -t_{n-p}\left(\frac{\alpha}{2}\right) \leq t_{\beta_i} \leq t_{n-p}\left(\frac{\alpha}{2}\right) \right) = \alpha && \text{since the $t$ distribution is symmetric} \\ &\implies \mathbb{P}\left( -t_{n-p}\left(\frac{\alpha}{2}\right) \leq \frac{\hat{\beta}_i - \beta_i}{\sqrt{\hat{\sigma}^2_{\beta_i}}} \leq t_{n-p}\left(\frac{\alpha}{2}\right) \right) = \alpha && \text{since the $t_{\beta_i} = \frac{\hat{\beta}_i - \beta_i}{\sqrt{\hat{\sigma}^2_{\beta_i}}}$} \\ &\implies \mathbb{P}\left( -t_{n-p}\left(\frac{\alpha}{2}\right)\sqrt{\hat{\sigma}^2_{\beta_i}} \leq \hat{\beta}_i - \beta_i \leq t_{n-p}\left(\frac{\alpha}{2}\right)\sqrt{\hat{\sigma}^2_{\beta_i}} \right) = \alpha \\ &\implies \mathbb{P}\left( -t_{n-p}\left(\frac{\alpha}{2}\right)\sqrt{\hat{\sigma}^2_{\beta_i}} \leq \beta_i - \hat{\beta}_i \leq t_{n-p}\left(\frac{\alpha}{2}\right)\sqrt{\hat{\sigma}^2_{\beta_i}} \right) = \alpha \\ &\implies \mathbb{P}\left( \hat{\beta}_i - t_{n-p}\left(\frac{\alpha}{2}\right)\sqrt{\hat{\sigma}^2_{\beta_i}} \leq \beta_i \leq \hat{\beta}_i + t_{n-p}\left(\frac{\alpha}{2}\right)\sqrt{\hat{\sigma}^2_{\beta_i}} \right) = \alpha \\ \end{align*}\]
So
\[ \left(\hat{\beta}_i - t_{n-p}\left(\frac{\alpha}{2}\right)\sqrt{\hat{\sigma}^2_{\beta_i}}, \hat{\beta}_i + t_{n-p}\left(\frac{\alpha}{2}\right)\sqrt{\hat{\sigma}^2_{\beta_i}} \right) \] is a random interval, that is an interval that is a function of a random variables, in this case the random variables \(\hat{\beta}_i\) and \(\hat{\sigma}^2_{\beta_i}\). This random interval will capture the true parameter \(\beta_i\) with probability \(\alpha\). However, when data is observed and the interval is fixed (at the observed values), the interval either captures the true parameter or not (something we don’t know in general).
9.4.2 Confidence intervals for the expected mean of a new observation \(\mathbf{x}_{new}\)
Note that the expected mean of a new observation \(\mathbf{x}_{new}\) is given by:
\[ \mathbb{E}[y_{new}] = \mathbf{x}_{new}' \boldsymbol{\beta}\] then we can consider, an estimate of this parameter, as:
\[ \mathbf{x}_{new}' \hat{\boldsymbol{\beta}} \] this estimate is a linear combination of \(\hat{\boldsymbol{\beta}}\), therefore it has a normal distribution with mean and variance as follows:
\[\mathbb{E}[\mathbf{x}_{new} \hat{\boldsymbol{\beta}}] = \mathbf{x}_{new} \mathbb{E}[\hat{\boldsymbol{\beta}}] = \mathbf{x}_{new}' \boldsymbol{\beta}\] \[\mathbb{V}[\mathbf{x}_{new}' \hat{\boldsymbol{\beta}}] = \mathbf{x}_{new}' \mathbb{V}[\hat{\boldsymbol{\beta}}] \mathbf{x}_{new} = \mathbf{x}_{new}' \sigma^2 (\mathbf{X}' \mathbf{X})^{-1} \mathbf{x}_{new} = \sigma^2 \mathbf{x}_{new}' (\mathbf{X}' \mathbf{X})^{-1} \mathbf{x}_{new} \] that is: \[ \mathbf{x}_{new}' \hat{\boldsymbol{\beta}} \sim N \left(\mathbf{x}_{new}' \boldsymbol{\beta}, \sigma^2 \mathbf{x}_{new}' (\mathbf{X}' \mathbf{X})^{-1} \mathbf{x}_{new} \right) \]
so, similarly, we can consider
\[ t_{\mathbf{x}_{new}'\boldsymbol{\beta}} = \frac{\mathbf{x}_{new}'\hat{\boldsymbol{\beta}} - \mathbf{x}_{new}'\boldsymbol{\beta}}{\sqrt{\hat{\sigma}^2_{\mathbf{x}_{new}'\boldsymbol{\beta}}}}=\frac{\frac{\mathbf{x}_{new}'\hat{\boldsymbol{\beta}} - \mathbf{x}_{new}'\boldsymbol{\beta}}{\sqrt{\sigma^2_{\mathbf{x}_{new}'\boldsymbol{\beta}}}}}{\sqrt{\frac{\frac{\hat{\mathbf{e}}'\hat{\mathbf{e}}}{\sigma^2}}{n-p}}} \sim \chi^2_{n-p} \] where \(\hat{\sigma}^2_{\mathbf{x}_{new}'\boldsymbol{\beta}} = \hat{\sigma}^2 \mathbf{x}_{new}' (\mathbf{X}' \mathbf{X})^{-1} \mathbf{x}_{new}\). This quantity is distributed as a \(t\) with \(n-p\) degrees of freedom since:
\[\frac{\mathbf{x}_{new}'\hat{\boldsymbol{\beta}} - \mathbf{x}_{new}'\boldsymbol{\beta}}{\sqrt{\sigma^2_{\mathbf{x}_{new}'\boldsymbol{\beta}}}} \sim N(0, 1)\] \[ \frac{\hat{\mathbf{e}}'\hat{\mathbf{e}}}{\sigma^2} \sim \chi^2_{n-p}\] and this random variables are independent since one is a function of \(\hat{\boldsymbol{\beta}}\) and the other a function of \(\hat{\mathbf{e}}\).
Then we can conclude that:
\[\begin{align*} \mathbb{P} &\left( -t_{n-p}\left(\frac{\alpha}{2}\right) \leq t_{\mathbf{x}_{new}'\boldsymbol{\beta}} \leq t_{n-p}\left(\frac{\alpha}{2}\right) \right) = \alpha \\ &\implies \mathbb{P}\left( \mathbf{x}_{new}'\hat{\boldsymbol{\beta}} - t_{n-p}\left(\frac{\alpha}{2}\right)\sqrt{\hat{\sigma}^2_{\mathbf{x}_{new}'\boldsymbol{\beta}}} \leq \mathbf{x}_{new}'\boldsymbol{\beta}\leq \mathbf{x}_{new}'\hat{\boldsymbol{\beta}} + t_{n-p}\left(\frac{\alpha}{2}\right)\sqrt{\hat{\sigma}^2_{\mathbf{x}_{new}'\boldsymbol{\beta}}} \right) = \alpha \\ \end{align*}\]
so, the random interval is given by:
\[ \left( \mathbf{x}_{new}'\hat{\boldsymbol{\beta}} - t_{n-p}\left(\frac{\alpha}{2}\right)\sqrt{\hat{\sigma}^2_{\mathbf{x}_{new}'\boldsymbol{\beta}}} , \mathbf{x}_{new}'\hat{\boldsymbol{\beta}} + t_{n-p}\left(\frac{\alpha}{2}\right)\sqrt{\hat{\sigma}^2_{\mathbf{x}_{new}'\boldsymbol{\beta}}} \right) \] is a random interval that captures \(\mathbf{x}_{new}'\boldsymbol{\beta}\) with probability \(\alpha\).
9.4.3 Confidence intervals for linear combinations of \(\boldsymbol{\beta}\)
Note that if we consider the parameter
\[ \mathbf{a}' \hat{\boldsymbol{\beta}}\] then we note that \(\beta_i\) and \(\mathbf{x}_{new} \boldsymbol{\beta}\) are particular cases, where the value of \(\mathbf{a}\) is a s follows:
\[ \mathbf{a}= (0,\ldots,0,1,0,\ldots,0) \quad \text{for} \quad \mathbf{a}\boldsymbol{\beta}= \beta_i\] \[ \mathbf{a}= \mathbf{x}_{new} \quad \text{for} \quad \mathbf{a}\boldsymbol{\beta}= \mathbf{x}_{new}' \boldsymbol{\beta}\] then, performing similar operations as before, we can create random intervals to estimate \(\mathbf{a}' \boldsymbol{\beta}\) as follows:
\[ \left( \mathbf{a}'\hat{\boldsymbol{\beta}} - t_{n-p}\left(\frac{\alpha}{2}\right)\sqrt{\hat{\sigma}^2_{\mathbf{a}'\boldsymbol{\beta}}} , \mathbf{a}'\hat{\boldsymbol{\beta}} + t_{n-p}\left(\frac{\alpha}{2}\right)\sqrt{\hat{\sigma}^2_{\mathbf{a}'\boldsymbol{\beta}}} \right) \] that captures \(\mathbf{a}' \boldsymbol{\beta}\) with probability \(\alpha\). Where \(\hat{\sigma}^2_{\mathbf{a}'\boldsymbol{\beta}} = \hat{\sigma}^2 \mathbf{a}' (\mathbf{X}' \mathbf{X})^{-1} \mathbf{a}\).
9.5 Hypothesis Testing
We will approach hypothesis testing using an implausibility framework. This involves formulating a null hypothesis, \(H_0\), and assuming it to be true. Next, we calculate a test statistic that follows a specific distribution under the null hypothesis. By comparing the observed value of the statistic to this distribution, we assess how plausible it is to observe such a value if \(H_0\) is true.
9.5.1 Testing for the Overall Regression
For this hypothesis, we will use the notation of:
\[ \mathbf{X}^* = [\mathbf{1}\mathbf{X}] \quad \text{and} \quad \boldsymbol{\beta}^* = [\beta_0, \boldsymbol{\beta}]' \in \mathbb{R}^{p}\] that is, the \(*\) indicates all the independent variables. With \(\mathbf{X}\) of full rank.
Our first test is to see if the Linear Regression framework is useful at all. That is, we want to test \(\mathcal{H}_0: \boldsymbol{\beta}= \mathbf{0}\). Before designing our test statistic we will show the following auxiliary results:
- \(SS_{reg} = \mathbf{y}'(\mathbf{H}- \mathbf{H}_0)\mathbf{y}\).
- \(\mathbf{H}\mathbf{H}_0 = \mathbf{H}_0 \mathbf{H}= \mathbf{H}_0\).
- \((\mathbf{H}- \mathbf{H}_0)\) is idempotent.
- \(\mathbf{y}'(\mathbf{H}- \mathbf{H}_0)\mathbf{y}\) and \(\mathbf{y}'(\mathbf{I}- \mathbf{H})\mathbf{y}\) are independent.
- Under the null hypothesis \(\mathcal{H}_0: \boldsymbol{\beta}= \mathbf{0}\), \(\frac{\mathbf{y}'(\mathbf{H}- \mathbf{H}_0)\mathbf{y}}{\sigma^2}\) is distributed like a \(\chi^2_{p-1}\).
For auxiliary result 1, we have that:
\[\begin{align*} SS_{tot} &= SS_{reg} + SS_{res} \\ &\implies \mathbf{y}'(\mathbf{I}- \mathbf{H}_0)\mathbf{y}= SS_{reg} + \mathbf{y}'(\mathbf{I}- \mathbf{H})\mathbf{y}&& \text{since $SS_{tot} = \mathbf{y}'(\mathbf{I}- \mathbf{H})\mathbf{y}$ and $SS_{res} = \mathbf{y}'(\mathbf{I}- \mathbf{H})\mathbf{y}$.} \\ &\implies SS_{reg} = \mathbf{y}'(\mathbf{I}- \mathbf{H}_0)\mathbf{y}- \mathbf{y}'(\mathbf{I}- \mathbf{H})\mathbf{y}&& \\ &\implies SS_{reg} = \mathbf{y}'(\mathbf{I}- \mathbf{H}_0 - \mathbf{I}+ \mathbf{H})\mathbf{y}&& \\ &\implies SS_{reg} = \mathbf{y}'(\mathbf{H}- \mathbf{H}_0)\mathbf{y}&& \\ \end{align*}\]
For auxiliary result 2, we have that:
Both \(\mathbf{H}\) and \(\mathbf{H}0\) are symmetric, then \(\mathbf{H}\mathbf{H}_0 = \mathbf{H}_0 \mathbf{H}\), and:
\[\begin{align*} \mathbf{H}\mathbf{H}_0 = \mathbf{H}\mathbf{1}(\mathbf{1}' \mathbf{1})^{-1} \mathbf{1}' && \\ \mathbf{H}\mathbf{H}_0 = \mathbf{1}(\mathbf{1}' \mathbf{1})^{-1} \mathbf{1}' && \text{since $\mathbf{H}\mathbf{1}= \mathbf{1}$.} \\ \mathbf{H}\mathbf{H}_0 = \mathbf{H}_0 && \\ \end{align*}\]
For auxiliary result 3, we have that:
\[\begin{align*} (\mathbf{H}- \mathbf{H}_0)(\mathbf{H}- \mathbf{H}_0) &= \mathbf{H}\mathbf{H}- \mathbf{H}\mathbf{H}_0 - \mathbf{H}_0 \mathbf{H}+ \mathbf{H}_0 \mathbf{H}_0 && \\ &= \mathbf{H}- \mathbf{H}\mathbf{H}_0 - \mathbf{H}_0 \mathbf{H}+ \mathbf{H}_0 && \text{since $\mathbf{H}_0$ and $\mathbf{H}$ are idempotent.} \\ &= \mathbf{H}- \mathbf{H}_0 - \mathbf{H}_0 + \mathbf{H}_0 && \text{since $\mathbf{H}\mathbf{H}_0 = \mathbf{H}_0 \mathbf{H}= \mathbf{H}_0$.} \\ &= \mathbf{H}- \mathbf{H}_0 && \\ \end{align*}\]
so, \((\mathbf{H}- \mathbf{H}_0)\) is idempotent.
For auxiliary result 4, first we have that:
\[\begin{align*} \mathbb{C}[(\mathbf{H}- \mathbf{H}_0) \mathbf{y}, (\mathbf{I}- \mathbf{H}) \mathbf{y}] &= (\mathbf{H}- \mathbf{H}_0) \mathbb{C}[\mathbf{y},\mathbf{y}] (\mathbf{I}- \mathbf{H}) \\ &= (\mathbf{H}- \mathbf{H}_0) \mathbb{V}[\mathbf{y}] (\mathbf{I}- \mathbf{H}) \\ &= \sigma^2 (\mathbf{H}- \mathbf{H}_0) (\mathbf{I}- \mathbf{H}) \\ &= \sigma^2 (\mathbf{H}- \mathbf{H}_0 - \mathbf{H}\mathbf{H}+ \mathbf{H}_0 \mathbf{H}) \\ &= \sigma^2 (\mathbf{H}- \mathbf{H}_0 - \mathbf{H}+ \mathbf{H}_0 \mathbf{H}) && \text{since $\mathbf{H}$ is idempotent.} \\ &= \sigma^2 (\mathbf{H}- \mathbf{H}_0 - \mathbf{H}+ \mathbf{H}_0) && \text{since $\mathbf{H}_0 \mathbf{H}= \mathbf{H}_0$.} \\ &= \sigma^2 \mathbf{0}&& \\ &= \mathbf{0}&& \\ \end{align*}\]
This tells us that \((\mathbf{H}- \mathbf{H}_0)\mathbf{y}\) and \((\mathbf{I}- \mathbf{H})\mathbf{y}\) are uncorrelated. Now, since \((\mathbf{H}- \mathbf{H}_0)\mathbf{y}\) and \((\mathbf{I}- \mathbf{H})\mathbf{y}\) are normally distributed, then zero correlation implies independence. Then, any function of this 2 quantities are independent. Note that:
\[\begin{align*} \mathbf{y}'(\mathbf{H}- \mathbf{H}_0)\mathbf{y} &= \mathbf{y}'(\mathbf{H}- \mathbf{H}_0)(\mathbf{H}- \mathbf{H}_0)(\mathbf{H}- \mathbf{H}_0)\mathbf{y}&& \text{since $(\mathbf{H}- \mathbf{H}_0)$ is idempotent.} \end{align*}\]
Then, \(\mathbf{y}'(\mathbf{H}- \mathbf{H}_0)\mathbf{y}\) is a quadratic function of \((\mathbf{H}- \mathbf{H}_0)\mathbf{y}\). Similarly, \(\mathbf{y}'(\mathbf{H}- \mathbf{H})\mathbf{y}\) is a quadratic function of \((\mathbf{H}- \mathbf{H})\mathbf{y}\). Therefore, \(\mathbf{y}'(\mathbf{H}- \mathbf{H}_0)\mathbf{y}\) and \(\mathbf{y}'(\mathbf{H}- \mathbf{H})\mathbf{y}\) are independent.
For result 5, we have that:
\[\begin{align*} (\mathbf{H}- \mathbf{H}_0)\mathbf{y} &= (\mathbf{H}- \mathbf{H}_0)(\mathbf{X}^* \boldsymbol{\beta}^* + \mathbf{e}) && \text{since $\mathbf{y}= \mathbf{X}\boldsymbol{\beta}+ \mathbf{e}$.} \\ &= (\mathbf{H}- \mathbf{H}_0)([\mathbf{1}\mathbf{X}] [\beta_0, \boldsymbol{\beta}]' + \mathbf{e}) && \text{since $\mathbf{X}^* = [\mathbf{1}\mathbf{X}] \quad \text{and} \quad \boldsymbol{\beta}^* = [\beta_0, \boldsymbol{\beta}]'$.} \\ &= (\mathbf{H}- \mathbf{H}_0)(\mathbf{1}\beta_0 + \mathbf{X}\boldsymbol{\beta}+ \mathbf{e}) && \\ &= (\mathbf{H}- \mathbf{H}_0)(\mathbf{1}\beta_0) + (\mathbf{H}- \mathbf{H}_0)(\mathbf{X}\boldsymbol{\beta}+ \mathbf{e}) && \\ &= (\mathbf{H}\mathbf{1}- \mathbf{H}_0 \mathbf{1})\beta_0 + (\mathbf{H}- \mathbf{H}_0)(\mathbf{X}\boldsymbol{\beta}+ \mathbf{e}) && \\ &= (\mathbf{1}- \mathbf{1})\beta_0 + (\mathbf{H}- \mathbf{H}_0)(\mathbf{X}\boldsymbol{\beta}+ \mathbf{e}) && \\ &= (\mathbf{H}- \mathbf{H}_0)(\mathbf{X}\boldsymbol{\beta}+ \mathbf{e}) && \\ &= (\mathbf{H}- \mathbf{H}_0)\mathbf{e}&& \text{iff $\mathcal{H}_0: \boldsymbol{\beta}= \mathbf{0}$ for any full rank $\mathbf{X}$.} \end{align*}\]
That is, for any full rank \(\mathbf{X}\), we have that:
\[ (\mathbf{H}- \mathbf{H}_0)\mathbf{y}= (\mathbf{H}- \mathbf{H}_0)\mathbf{e}\iff \mathcal{H}_0: \boldsymbol{\beta}= \mathbf{0}\]
Then:
\[\begin{align*} \mathbf{e}\sim N(0, \sigma^2 \mathbf{I}) &\implies \mathbf{e}'(\mathbf{H}- \mathbf{H}_0)\mathbf{e}\sim \sigma^2 \chi^2_{p-1} && \text{since $(\mathbf{H}- \mathbf{H}_0)$ is idempotent of rank $p-1$}. \\ &\implies \frac{\mathbf{e}'(\mathbf{H}- \mathbf{H}_0)\mathbf{e}}{\sigma^2} \sim \chi^2_{p-1} && \\ &\implies \frac{\mathbf{e}'(\mathbf{H}- \mathbf{H}_0)(\mathbf{H}- \mathbf{H}_0)(\mathbf{H}- \mathbf{H}_0)\mathbf{e}}{\sigma^2} \sim \chi^2_{p-1} && \text{since $(\mathbf{H}- \mathbf{H}_0)$ is idempotent}. \\ &\implies \frac{\mathbf{y}'(\mathbf{H}- \mathbf{H}_0)(\mathbf{H}- \mathbf{H}_0)(\mathbf{H}- \mathbf{H}_0)\mathbf{y}}{\sigma^2} \sim \chi^2_{p-1} && \text{iff the null hypothesis holds}. \\ &\implies \frac{\mathbf{y}'(\mathbf{H}- \mathbf{H}_0)\mathbf{y}}{\sigma^2} \sim \chi^2_{p-1} && \text{since $(\mathbf{H}- \mathbf{H}_0)$ is idempotent}. \\ \end{align*}\]
That is:
\[ \frac{\mathbf{y}'(\mathbf{H}- \mathbf{H}_0)\mathbf{y}}{\sigma^2} \sim \chi^2_{p-1} \iff \mathcal{H}_0: \boldsymbol{\beta}= \mathbf{0}\]
With this results, we propose the following statistic:
\[ F_{\boldsymbol{\beta}= 0} = \frac{\frac{SS_{reg}}{p-1}}{\frac{SS_{res}}{n-p}} \]
and we will show that, this statistic is distributed like an \(F_{p-1,n-p}\) only under the null hypothesis.
\[\begin{align*} F_{\boldsymbol{\beta}= 0} &= \frac{\frac{SS_{reg}}{p-1}}{\frac{SS_{res}}{n-p}} && \\ &= \frac{\frac{\mathbf{y}'(\mathbf{H}- \mathbf{H}_0)\mathbf{y}}{p-1}}{\frac{\mathbf{y}'(\mathbf{I}- \mathbf{H})\mathbf{y}}{n-p}} && \text{since $SS_{reg} = \mathbf{y}'(\mathbf{H}- \mathbf{H}_0)\mathbf{y}$ and $SS_{res}=\mathbf{y}'(\mathbf{I}- \mathbf{H})\mathbf{y}$} \\ &= \frac{\frac{\mathbf{y}'(\mathbf{H}- \mathbf{H}_0)\mathbf{y}}{\sigma^2}\frac{1}{p-1}}{\frac{\mathbf{y}'(\mathbf{I}- \mathbf{H})\mathbf{y}}{\sigma^2}\frac{1}{n-p}} && \\ &\sim \frac{\frac{\chi^2_{p-1}}{p-1}}{\frac{\chi^2_{n-p}}{n-p}} && \text{since $\frac{\mathbf{y}'(\mathbf{H}- \mathbf{H}_0)\mathbf{y}}{\sigma^2} \sim \chi^2_{p-1}$ under the null hypothesis.} \\ &\sim F_{p-1,n-p} && \text{since $\mathbf{y}'(\mathbf{H}- \mathbf{H}_0)\mathbf{y}$ and $\mathbf{y}'(\mathbf{H}- \mathbf{H})\mathbf{y}$ are independent.} \end{align*}\]
So, once we observe the value of this statistic, we can contrast it with respect respect to this distribution. Call \(F^*_{\boldsymbol{\beta}= 0}\) the observed value, and consider a random variable \(F \sim F_{p-1,n-p}\), then we can see what would be the probability of observing the value of the statistic (or a more extreme value).
\[ \mathbb{P}(F \geq F^*_{\boldsymbol{\beta}= 0}) \] depending on how small or big is this probability, we can reject or not reject the null hypothesis. This value is a called a p-value.
9.5.2 Testing if one variable is not relevant
We can test if a particular variable is not relevant for the regression. That is, \(\mathcal{H}_0: \beta_i = 0\). We will use the same strategy, that is, we will build a test statistic that has a certain distribution only under the null hypothesis.
For this hypothesis we propose the following test statistic:
\[ t_{\beta_i = 0} = \frac{\hat{\beta}_i}{\sqrt{\hat{\sigma}^2 [(\mathbf{X}\mathbf{X})^{-1}]_{ii}}} \] First note that:
\[\begin{align*} \hat{\boldsymbol{\beta}} \sim N \left(\boldsymbol{\beta}, \sigma^2(\mathbf{X}\mathbf{X})^{-1}\right) &\implies \hat{\beta}_i \sim H(\beta_i, \sigma^2 [(\mathbf{X}\mathbf{X})^{-1}]_{ii}) \\ &\implies \frac{\hat{\beta}_i}{\sqrt{\sigma^2 [(\mathbf{X}\mathbf{X})^{-1}]_{ii}}} \sim N \left(\frac{\beta_i}{\sqrt{\sigma^2 [(\mathbf{X}\mathbf{X})^{-1}]_{ii}}}, 1 \right) \\ &\implies \frac{\hat{\beta}_i}{\sqrt{\sigma^2 [(\mathbf{X}\mathbf{X})^{-1}]_{ii}}} \sim N \left( 0, 1 \right) && \iff \mathcal{H}_0: \beta_i = 0 \\ \end{align*}\]
Then we have:
$$$$
\[\begin{align*} t_{\beta_i = 0} &= \frac{\hat{\beta}_i}{\sqrt{\hat{\sigma}^2 [(\mathbf{X}\mathbf{X})^{-1}]_{ii}}} \\ &= \frac{\frac{\hat{\beta}_i }{\sqrt{\sigma^2 [(\mathbf{X}\mathbf{X})^{-1}]_{ii}}}}{\frac{\sqrt{\hat{\sigma}^2 [(\mathbf{X}\mathbf{X})^{-1}]_{ii}}}{\sqrt{\sigma^2 [(\mathbf{X}\mathbf{X})^{-1}]_{ii}}}} \\ &= \frac{\frac{\hat{\beta}_i }{\sqrt{\sigma^2 [(\mathbf{X}\mathbf{X})^{-1}]_{ii}}}}{\sqrt{\frac{\hat{\sigma}^2}{\sigma^2}}} \\ &= \frac{\frac{\hat{\beta}_i }{\sqrt{\sigma^2 [(\mathbf{X}\mathbf{X})^{-1}]_{ii}}}}{\sqrt{\frac{\hat{\mathbf{e}}'\hat{\mathbf{e}}}{\sigma^2}\frac{1}{n-p}}} && \text{since $\hat{\sigma}^2 = \frac{\hat{\mathbf{e}}\hat{\mathbf{e}}}{n-p}$} \\ &\sim \frac{N \left(\frac{\beta_i}{\sqrt{\sigma^2 [(\mathbf{X}\mathbf{X})^{-1}]_{ii}}}, 1 \right)}{\sqrt{\frac{\chi^2_{n-p}}{n-p}}} && \text{since $\frac{\hat{\beta}_i}{\sqrt{\sigma^2 [(\mathbf{X}\mathbf{X})^{-1}]_{ii}}} \sim N \left( 0, 1 \right)$ and $\frac{\hat{\mathbf{e}}'\hat{\mathbf{e}}}{\sigma^2} \sim \chi^2_{n-p}$} \\ &\sim \frac{N \left(0, 1 \right)}{\sqrt{\frac{\chi^2_{n-p}}{n-p}}} && \iff \mathcal{H}_0: \beta_i = 0 \\ &\sim t_{n-p} && \text{since $\hat{\beta}_i$ and $\hat{\sigma}^2$ are independent}. \\ \end{align*}\]
Then, under the null hypothesis we have that:
\[ t_{\beta_i = 0} \sim t_{n-p}\] So, if we call \(t_{\beta_i = 0}^*\) the observed value of \(t_{\beta_i = 0}\), and if we let \(t\) be distributed as \(t_{n-p}\), we can compute:
\[ \mathbb{P}(t \geq t_{\beta_i = 0}^*) \] and depending on the value, we can reject or accept the null hypothesis.
9.5.3 Testing if a Subgroup of the Variables is Relevant
For this test, we can assume without loss of generality, that the variables we want to see if it is relevant are the first \(k\). So we can divide the design matrix as:
\[ \mathbf{X}= [\mathbf{X}_1 \mathbf{X}_2] \] where the variables to test are in \(\mathbf{X}_1\) and the rest of the variables are in \(\mathbf{X}_2\) (including possibly the intercept). And similarly we have \(\boldsymbol{\beta}= [\boldsymbol{\beta}_1 \boldsymbol{\beta}_2]'\).
This test is similar to the first test once we express it accordingly. We will consider two linear regressions. One including all variables and one excluding the variables to be tested indexed by \(2\). With this we can build the following test statistics:
\[ F_{\boldsymbol{\beta}_1=\mathbf{0}} = \frac{\frac{SS_{res,2} - SS_{res}}{k}}{\frac{SS_{res}}{n-p}} \] Then note the following:
\[\begin{align*} SS_{res,2} - SS_{res} &= \mathbf{y}'(\mathbf{I}- \mathbf{H}_2)\mathbf{y}- \mathbf{y}'(\mathbf{I}- \mathbf{H}) \mathbf{y}\\ &= \mathbf{y}'(\mathbf{I}- \mathbf{H}_2 - \mathbf{I}+ \mathbf{H}) \mathbf{y}\\ &= \mathbf{y}'(\mathbf{H}- \mathbf{H}_2) \mathbf{y}\\ \end{align*}\]
Again, we will see that \((\mathbf{H}- \mathbf{H}_2)\) is idempotent and \((\mathbf{H}- \mathbf{H}_2)\mathbf{y}= (\mathbf{H}- \mathbf{H}_2)\mathbf{e}\) only under the null hypothesis.
First, let us see that \((\mathbf{H}- \mathbf{H}_2)\) is idempotent. First note that:
\[ \mathbf{H}\mathbf{H}_2 = \mathbf{H}_2 \mathbf{H}= \mathbf{H}_2\] since \(\mathbf{H}_2\) is the projection matrix of the columns of \(\mathbf{X}_2\) a subspace of the columns of \(\mathbf{X}\). Then:
\[\begin{align*} (\mathbf{H}- \mathbf{H}_2)(\mathbf{H}- \mathbf{H}_2) &= \mathbf{H}\mathbf{H}- \mathbf{H}_2 \mathbf{H}- \mathbf{H}\mathbf{H}_2 + \mathbf{H}_2 \mathbf{H}_2 \\ &= \mathbf{H}- \mathbf{H}_2 \mathbf{H}- \mathbf{H}\mathbf{H}_2 + \mathbf{H}_2 && \text{since $\mathbf{H}_2$ and $\mathbf{H}$ are idempotent}. \\ &= \mathbf{H}- \mathbf{H}_2 - \mathbf{H}_2 + \mathbf{H}_2 && \text{since $\mathbf{H}\mathbf{H}_2 = \mathbf{H}_2 \mathbf{H}= \mathbf{H}_2$}. \\ &= \mathbf{H}- \mathbf{H}_2 && \\ \end{align*}\]
then \((\mathbf{H}- \mathbf{H}_2)\) is idempotent.
Now let us see that \((\mathbf{H}- \mathbf{H}_R)\mathbf{y}= (\mathbf{H}- \mathbf{H}_R)\mathbf{e}\) under the null hypothesis. First, let us note that:
\[ \mathbf{H}\mathbf{X}_2 = \mathbf{X}_2 \] since space generated by \(\mathbf{X}_2\) is a subspace of the space generated by \(\mathbf{X}\), since \(\mathbf{X}\) contains the columns of \(\mathbf{X}_2\). And we also note that:
\[ \mathbf{H}_2 \mathbf{X}_2 = \mathbf{X}_2 \] since \(\mathbf{H}_2\) is the projection matrix of the space generated by the columns of \(\mathbf{X}_2\). We note that this results can be proven algebraically.
Then:
\[\begin{align*} (\mathbf{H}- \mathbf{H}_2)\mathbf{y} &= (\mathbf{H}- \mathbf{H}_2)(\mathbf{X}\boldsymbol{\beta}+ \mathbf{e}) \\ &= (\mathbf{H}- \mathbf{H}_2)([\mathbf{X}_1 \mathbf{X}_2] [\boldsymbol{\beta}_1' \boldsymbol{\beta}_2']' + \mathbf{e}) \\ &= (\mathbf{H}- \mathbf{H}_2)(\mathbf{X}_1 \boldsymbol{\beta}_1 \mathbf{X}_2 \boldsymbol{\beta}_2 + \mathbf{e}) \\ &= (\mathbf{H}- \mathbf{H}_2)(\mathbf{X}_2 \boldsymbol{\beta}_2) + (\mathbf{H}- \mathbf{H}_1)(\mathbf{X}_1 \boldsymbol{\beta}_1 + \mathbf{e}) \\ &= (\mathbf{H}\mathbf{X}_2 - \mathbf{H}_2\mathbf{X}_2)\boldsymbol{\beta}_2 + (\mathbf{H}- \mathbf{H}_1)(\mathbf{X}_1 \boldsymbol{\beta}_1 + \mathbf{e}) \\ &= (\mathbf{X}_2 - \mathbf{X}_2)\boldsymbol{\beta}_2 + (\mathbf{H}- \mathbf{H}_1)(\mathbf{X}_1 \boldsymbol{\beta}_1 + \mathbf{e}) && \text{since $\mathbf{H}\mathbf{X}_1 = \mathbf{X}_1$ and $\mathbf{H}_1 \mathbf{X}_1 = \mathbf{X}_1$} \\ &= (\mathbf{H}- \mathbf{H}_R)(\mathbf{X}_1 \boldsymbol{\beta}_1 + \mathbf{e}) && \\ &= (\mathbf{H}- \mathbf{H}_R)\mathbf{e}&& \iff \mathcal{H}_0: \boldsymbol{\beta}_1 = \mathbf{0}\\ \end{align*}\]
So, if \(\mathbf{X}_1\) is full rank, then we have that:
\[ (\mathbf{H}- \mathbf{H}_2)\mathbf{y}= (\mathbf{H}- \mathbf{H}_R)\mathbf{e}\iff \mathcal{H}_0: \boldsymbol{\beta}_1 = \mathbf{0}\]
Then we can proceed to see what is the distribution of our test statistic under the null hypothesis.
\[\begin{align*} F_{\boldsymbol{\beta}_1=\mathbf{0}} &= \frac{\frac{SS_{res,2} - SS_{res}}{k}}{\frac{SS_{res}}{n-p}} && \\ &= \frac{\frac{\mathbf{y}'(\mathbf{H}- \mathbf{H}_2)\mathbf{y}}{k}}{\frac{\mathbf{y}'(\mathbf{I}- \mathbf{H})\mathbf{y}}{n-p}} && \text{since $SS_{res,2} - SS_{res} = \mathbf{y}'(\mathbf{H}- \mathbf{H}_2)\mathbf{y}$ and $SS_{res}=\mathbf{y}'(\mathbf{I}- \mathbf{H})\mathbf{y}$} \\ &= \frac{\frac{\mathbf{y}'(\mathbf{H}- \mathbf{H}_2)\mathbf{y}}{\sigma^2}\frac{1}{k}}{\frac{\mathbf{y}'(\mathbf{I}- \mathbf{H})\mathbf{y}}{\sigma^2}\frac{1}{n-p}} && \\ &= \frac{\frac{\mathbf{y}'(\mathbf{H}- \mathbf{H}_2)(\mathbf{H}- \mathbf{H}_2)(\mathbf{H}- \mathbf{H}_2)\mathbf{y}}{\sigma^2}\frac{1}{k}}{\frac{\mathbf{y}'(\mathbf{I}- \mathbf{H})\mathbf{y}}{\sigma^2}\frac{1}{n-p}} && \text{snce $(\mathbf{H}- \mathbf{H}_2)$ is idempotent}. \\ &= \frac{\frac{\mathbf{e}'(\mathbf{H}- \mathbf{H}_2)(\mathbf{H}- \mathbf{H}_2)(\mathbf{H}- \mathbf{H}_2)\mathbf{e}}{\sigma^2}\frac{1}{k}}{\frac{\mathbf{y}'(\mathbf{I}- \mathbf{H})\mathbf{y}}{\sigma^2}\frac{1}{n-p}} && \iff \mathcal{H}_0: \boldsymbol{\beta}_1 = \mathbf{0}\\ &= \frac{\frac{\mathbf{e}'(\mathbf{H}- \mathbf{H}_2)\mathbf{e}}{\sigma^2}\frac{1}{k}}{\frac{\mathbf{y}'(\mathbf{I}- \mathbf{H})\mathbf{y}}{\sigma^2}\frac{1}{n-p}} && \text{since $(\mathbf{H}- \mathbf{H}_2)$ is idempotent}. \\ &\sim \frac{\frac{\chi^2_{k}}{k}}{\frac{\chi^2_{n-p}}{n-p}} && \text{since $(\mathbf{H}- \mathbf{H}_2)$ is idempotent and $\frac{\mathbf{e}}{\sqrt{\sigma^2}} \sim N(0, \mathbf{I})$}. \\ &\sim F_{k,n-p} && \end{align*}\]
So, before we observe the data, \(F_{\boldsymbol{\beta}_1=\mathbf{0}}\) has a \(F_{k,n-p}\) distribution. Then, once we observe the data, call \(F_{\boldsymbol{\beta}_1=\mathbf{0}}^*\) the observed value of the statistic, and let \(F\) be distributed as an \(F_{k,n-p}\), we can compute:
\[ \mathbb{P}(F \geq F_{\boldsymbol{\beta}_1=\mathbf{0}}^*) \] and reject the null hypothesis if this probability is small and not reject if this probability is small.