Returns an array of cells for the fitted values of the conditional mean, residuals, or leverage measures.
Syntax
MLR_FITTED (X, Mask, Y, Intercept, Return)
- X
- is the independent (explanatory) variables data matrix, so each column represents one variable.
- Mask
- is the boolean array to choose the explanatory variables in the model. If missing, all variables in X are included.
- Y
- is the response or the dependent variable data array (a one-dimensional array of cells (e.g., rows or columns)).
- Intercept
- is the constant or the intercept value to fix (e.g., zero). If missing, an intercept will not be fixed and is computed typically.
- Return
- is a switch to select the return output (1 = Fitted values (default), 2 = Residuals, 3 = Standardized residuals, 4 = Leverage, 5 = Cook's distance).
Value Return 1 Fitted/Conditional Mean (default). 2 Residuals. 3 Standardized (aka. Studentized) residuals. 4 Leverage (H). 5 Cook's distance (D).
Remarks
- The underlying model is described here.
- The regression fitted (aka estimated) conditional mean is calculated as follows: $$\hat y_i = E \left[ Y| x_i1\cdots x_ip \right] = \alpha + \hat \beta_1 \times x_i1 + \cdots + \beta_p \times x_ip$$ Residuals are defined as follows: $$e_i = y_i - \hat y_i$$ The standardized (aka studentized) residuals are calculated as follows: $$\bar e_i = \frac{e_i}{\hat \sigma_i}$$ Where:
- $\hat y$ is the estimated regression value.
- $e $ is the error term in the regression.
- $\hat e$ is the standardized error term.
- $\hat \sigma_i$ is the standard error for the i-th observation.
- For the influential data analysis, SLR_FITTED computes two values: leverage statistics and Cook's distance for observations in our sample data.
- Leverage statistics describe the influence that each observed value has on the fitted value for that same observation. By definition, the diagonal elements of the hat matrix are the leverages. $$H = X \left(X^\top X \right)^{-1} X^\top$$ $$L_i = h_{ii}$$ Where:
- $H$ is the Hat matrix for uncorrelated error terms.
- $\mathbf{X}$ is a (N x p+1) matrix of explanatory variables where the first column is all ones.
- $L_i$ is the leverage statistics for the i-th observation.
- $h_{ii}$ is the i-th diagonal element in the hat matrix.
- Cook's distance measures the effect of deleting a given observation. Data points with large residuals (outliers) and/or high leverage may distort the outcome and accuracy of a regression. Points with a large Cook's distance merit closer examination in the analysis. $$D_i = \frac{e_i^2}{p \ \mathrm{MSE}}\left[\frac{h_{ii}}{(1-h_{ii})^2}\right]$$ Where:
- $D_i$ is the Cook's distance for the i-th observation.
- $h_{ii}$ is the leverage statistics (or the i-th diagonal element in the hat matrix).
- $\mathrm{MSE}$ is the mean square error of the regression model.
- $p$ is the number of explanatory variables.
- $e_i$ is the error term (residual) for the i-th observation.
- The sample data may include data points with missing values.
- Each column in the input matrix corresponds to a separate variable.
- Each row in the input matrix corresponds to an observation.
- Observations (i.e., rows) with missing values in X or Y are removed.
- The number of rows of the response variable (Y) must equal the number of rows of the explanatory variable (X).
- The MLR_FITTED function is available starting with version 1.60 APACHE.
Files Examples
Related Links
References
- Hamilton, J.D.; Time Series Analysis, Princeton University Press (1994), ISBN 0-691-04289-6.
- Kenney, J. F. and Keeping, E. S. (1962) "Linear Regression and Correlation." Ch. 15 in Mathematics of Statistics, Pt. 1, 3rd ed. Princeton, NJ: Van Nostrand, pp. 252-285.
Comments
Article is closed for comments.