Returns an array of cells for the conditional mean (or residuals) fitted values.
Syntax
SLR_FITTED(X, Y, Intercept, Return_type)
- X
- is the independent (aka explanatory or predictor) variable data array (a one-dimensional array of cells (e.g., rows or columns)).
- Y
- is the response or the dependent variable data array (a one-dimensional array of cells (e.g., rows or columns)).
- Intercept
- is the constant or the intercept value to fix (e.g., zero). If missing, an intercept will not be fixed and is computed typically.
- Return_type
- is a switch to select the return output (1 = fitted values (default), 2 = residuals, 3 = std. residuals, 4 = X (cleaned), 5 = Y (cleaned)).
Method Description 1 Fitted/Conditional Mean 2 Residuals 3 Standardized (aka Studentized) Residuals 4 Leverage (H) 5 Cook's Distance (D)
Remarks
- The underlying model is described here.
- The regression fitted (aka estimated) conditional mean is calculated as follows:
$$\hat y_i = E \left[ Y| x_i \right] = \alpha + \hat \beta \times x_i$$
Residuals are defined as follows:
$$ e_i = y_i - \hat y_i $$
The standardized (aka studentized) residuals are calculated as follows:
$$\bar e_i = \frac{e_i}{\hat \sigma_i} $$
Where:
- $\hat y $ is the estimated regression value.
- $e $ is the error term in the regression.
- $\hat e $ is the standardized error term.
- $\hat \sigma_i $ is the standard error for the i-th observation.
- For the influential data analysis, SLR_FITTED computes two values: leverage statistics and Cook's distance for observations in our sample data.
- Leverage statistics describe the influence that each observed value has on the fitted value for that same observation. By definition, the diagonal elements of the hat matrix are the leverages.
$$H = X \left(X^\top X \right)^{-1} X^\top$$
$$L_i = h_{ii}$$
Where:
- $H$ is the Hat matrix for uncorrelated error terms.
- $\mathbf{X}$ is a (N x 2) matrix of explanatory variables where the first column is all ones.
- $L_i$ is the leverage statistics for the i-th observation.
- $h_{ii}$ is the i-th diagonal element in the hat matrix.
- Cook's distance measures the effect of deleting a given observation. Data points with large residuals (outliers) and/or high leverage may distort the outcome and accuracy of a regression. Points with a large Cook's distance merit closer examination in the analysis.
$$D_i = \frac{e_i^2}{p \ \mathrm{MSE}}\left[\frac{h_{ii}}{(1-h_{ii})^2}\right]$$
Where:- $D_i$ is the Cook's distance for the i-th observation.
- $h_{ii}$ is the leverage statistics (or the i-th diagonal element in the hat matrix).
- $\mathrm{MSE}$ is the mean square error of the regression model.
- $p$ is the number of explanatory variables (in SLP case, p=1).
- $e_i$ is the error term (residual) for the i-th observation.
- Different opinions exist regarding what cut-off values to use for spotting highly influential points. Still, generally speaking, a value above $\frac{1}{N}$ is a good candidate for exerting influence on the regression, and further investigation may be in order.
- The sample data may include data points with missing values.
- Each column in the input matrix corresponds to a separate variable.
- Each row in the input matrix corresponds to an observation.
- Observations (i.e., row) with missing values in X or Y are removed.
- The number of rows of the response variable (Y) must equal the number of rows of the explanatory variables (X).
- The SLR_FITTED function is available starting with version 1.60 APACHE.
Files Examples
Related Links
References
- Hamilton, J .D.; Time Series Analysis, Princeton University Press (1994), ISBN 0-691-04289-6
- Kenney, J. F. and Keeping, E. S. (1962) "Linear Regression and Correlation." Ch. 15 in Mathematics of Statistics, Pt. 1, 3rd ed. Princeton, NJ: Van Nostrand, pp. 252-285
Comments
Article is closed for comments.