Returns an array of cells for the fitted values of the conditional mean, residuals or leverage measures.

## Syntax

**PCR_FITTED**(

**X**,

**Mask**,

**Y**,

**Intercept**,

**Return_type**)

**X** is the independent variables data matrix, such that each column represents one variable.

**Mask** is the boolean array to choose the explanatory variables in the model. If missing, all variables in X are included.

**Y** is the response or the dependent variable data array (one dimensional array of cells (e.g. rows or columns)).

**Intercept** is the constant or the intercept value to fix (e.g. zero). If missing, an intercept will not be fixed and is computed normally.

**Return_type** is a switch to select the return output (1 = fitted values (default), 2 = residuals, 3 = standardized residuals, 4 = leverage, 5 = Cook's distance).

Method | Description |
---|---|

1 | Fitted/conditional mean |

2 | Residuals |

3 | Standardized (aka. Studentized) residuals |

4 | Leverage (H) |

5 | Cook's distance (D) |

## Remarks

- The underlying model is described here.
- The regression fitted (aka estimated) conditional mean is calculated as follows:

$$\hat y_i = E \left[ Y| x_i1\cdots x_ip \right] = \alpha + \hat \beta_1 \times x_i1 + \cdots + \beta_p \times x_ip$$

Residuals are defined as follows:

$$ e_i = y_i - \hat y_i $$

The standardized (aka studentized) residuals are calculated as follows:

$$\bar e_i = \frac{e_i}{\hat \sigma_i} $$

Where:

- $\hat y $ is the estimated regression value.
- $e $ is the error term in the regression.
- $\hat e $ is the standardized error term.
- $\hat \sigma_i $ is the standard error for the i-th observation.

- For the influential data analysis, PCR_FITTED computes two values: leverage statistics and Cook's distance for observations in our sample data.
- Leverage statistics describe the influence that each observed value has on the fitted value for that same observation. By definition, the diagonal elements of the
**hat matrix**are the leverages.

$$H = X \left(X^\top X \right)^{-1} X^\top$$

$$L_i = h_{ii}$$ Where:

- $H$ is the Hat matrix for uncorrelated error terms.
- $\mathbf{X}$ is a (N x p+1) matrix of explanatory variable where the first column is all ones.
- $L_i$ is the leverage statistics for the i-th observation.
- $h_{ii}$ is the i-th diagonal element in the hat matrix.

- Cook's distance measures the effect of deleting a given observation. Data points with large residuals (outliers) and/or high leverage may distort the outcome and accuracy of a regression. Points with a large Cook's distance are considered to merit closer examination in the analysis.

$$D_i = \frac{e_i^2}{p \ \mathrm{MSE}}\left[\frac{h_{ii}}{(1-h_{ii})^2}\right]$$

Where:- $D_i$ is the Cook's distance for the i-th observation.
- $h_{ii}$ is the leverage statistics (or the i-th diagonal element in the hat matrix).
- $\mathrm{MSE}$ is the mean square error of the regression model.
- $p$ is the number of explanatory variables.
- $e_i$ is the error term (residual) for the i-th observation.

- The sample data may include missing values.
- Each column in the input matrix corresponds to a separate variable.
- Each row in the input matrix corresponds to an observation.
- Observations (i.e. rows) with missing values in X or Y are removed.
- The number of rows of the response variable (Y) must be equal to the number of rows of the explanatory variable (X).
- The MLR_FITTED function is available starting with version 1.60 APACHE.

## Files Examples

## References

- Hamilton, J .D.; Time Series Analysis , Princeton University Press (1994), ISBN 0-691-04289-6
- Kenney, J. F. and Keeping, E. S. (1962) "Linear Regression and Correlation." Ch. 15 in Mathematics of Statistics, Pt. 1, 3rd ed. Princeton, NJ: Van Nostrand, pp. 252-285

## 0 Comments