Multiple Linear Regression (MLR) Forecast Error

Occasionally, we receive requests for a technical paper about regression modeling beyond our regular NumXL support, in order to delve more deeply into the mathematical formulation of MLR. We are always happy to address user requests, so we decided to share our internal technical notes with you.

In this paper, we’ll go over a simple, yet fundamental and often asked question about forecast error in a regression model.

background

Let’s assume the true underlying model or process is defined as follows:

$$y = \alpha + \beta_1 x_1+\beta_2 x_2 + \cdots + \beta_k x_k + \epsilon$$

Where:

• $y$  is the dependent (response) variable
• $\{x_1,x_2,\cdots,x_k \}$ are the independent (explanatory)variables.
• $\alpha$ is the real intercept (constant).
• $\beta_j$ is the coefficient (loading) of the j-th independent variable.
• ${\epsilon}$ is a set of independent, identical, normally distributed errors (residuals).

$$\epsilon \sim \textrm{i.i.d} \sim N(0,\sigma^2)$$

In practice, the true underlying model is unknown. However, with finite sample data and an OLS or other procedure, we can estimate the values of the coefficients (aka loadings) for the different input (explanatory) variables.

Let’s assume we have a sample dataset with N observations, i.e. $\{x_1,x_2,\cdots,x_k \}$. Using an OLS method, we arrive at the following regression model:

$$y = \hat{\alpha} + \hat{\beta_1} x_1 + \hat{\beta_2}x_2 + \cdots + \hat{\beta_k}x_k + u$$

Where/p>

• $\hat{\beta_j}$ is the OLS estimate for the j-th coefficient (loading).
• $\hat{\alpha}$ is the OLS estimate of the intercept.
• $\{u\}$ is is the regression residuals. The residuals are homoscedastic (i.e. stable variance) and uncorrelated with any of the input variables.

$$E[u]=0$$ $$E[u^2] = s^2$$ $$E[u\times \underset{1\leq i \leq k}{x_i}] = 0$$

Forecast

In practice, the true regression model is hidden or unknown. We will revert to the estimated regression model to perform a forecast.

Mathematically, the conditional forecast can be expressed as follows:

$$\hat{y} = E[ Y | x_1,x_2,\cdots, x_k ] = \hat{\alpha} + \hat{\beta_1}x_1 + \hat{\beta_2}x_2 +\cdots + \hat{\beta_k}x_k$$

As a result, the errors in the forecast originate from two distinct sources:

1. Residuals ($\{\epsilon\}$ or $\{u\}$).
2. Errors in the estimated coefficients’ values (i.e. using $\hat{\beta_j}$ instead of $\beta_j$)

Using an OLS procedure, the estimated values of one $\hat{\beta_j}$ are normally distributed. Nevertheless, the errors in the values of the whole set of parameters $\underset{1\leq i \leq k}{\hat{\beta_j}}$ are correlated. So, we can ignore the covariance terms when we examine the statistical significance of one coefficient, but we will need to factor in their overall/aggregate effect for the forecast error.

As a result, the forecast variance (aka error squared) can be expressed as follows:

$$Var[y-\hat{y}| x_{1,m},x_{2,m},\cdots x_{k,m}]=\sigma^2 \left(1+\frac{1}{N}+\frac{\sum_{j=1}^k (x_{j,m}-\bar{x_j})^2}{\sum_{i=1}^N\sum_{j=1}^k(x_{j,i}-\hat{x_j})^2} \right)$$

However, the variance of residuals ($\sigma^2$) in the true model is unknown, so we use the variance of the error terms ($\hat{\sigma}^2$) of the estimated regression model: $$\hat{\sigma}^2 = E[u^2]=E[(y-\alpha - \beta_1x_1-\beta_2x_2-\cdots - \beta_kx_k)^2]=\frac{SSE}{N-K-1}=\frac{\sum_{i=1}^N u_i^2}{N-K-1}$$

Overall, the MLR forecast error squared is expressed as follows:

$$Var[y-\hat{y}| x_{1,m},x_{2,m},\cdots x_{k,m}]=\frac{SSE}{N-k-1} \left(1+\frac{1}{N}+\frac{\sum_{j=1}^k (x_{j,m}-\bar{x_j})^2}{\sum_{i=1}^N\sum_{j=1}^k(x_{j,i}-\hat{x_j})^2} \right)$$

Now, let’s take a close look at the formula above and try to explain the different terms:

1. $\hat{\sigma}^2$ is the estimated variance of true regression model residuals. This value is constant and independent from the X-value(s) of the target data-point.
2. $\frac{\hat{\sigma}^2}{N}$ is the error in the estimated intercept (aka. constant). This value is constant and independent from the X-values of the target data-point.
3. The last term is proportional to the squared (Euclidean) distance of the target data-point from the center of the sample data set. This term is zero at the sample data center point $(\bar{x}_{1,t},\bar{x}_{2,t},\cdots,\bar{x}_{k,t})$.

In effect, the forecast variance is higher for data points $(x_{1,t},x_{2,t},\cdots,x_{k,t})$ that are further from the center of the input sample data set (i.e. $(\bar{x}_{1,t},\bar{x}_{2,t},\cdots,\bar{x}_{k,t})$).

As a result, the forecast error is smallest at the sample data center point $(\bar{x}_{1,t},\bar{x}_{2,t},\cdots,\bar{x}_{k,t})$.