SLR_FITTED - Simple Linear Regression Fitted Values

Returns an array of cells for the conditional mean (or residuals) fitted values. 

Syntax

SLR_FITTED(X, Y, Intercept, Return_type)
X
is the independent (aka explanatory or predictor) variable data array (a one-dimensional array of cells (e.g., rows or columns)).
Y
is the response or the dependent variable data array (a one-dimensional array of cells (e.g., rows or columns)).
Intercept
is the constant or the intercept value to fix (e.g., zero). If missing, an intercept will not be fixed and is computed typically.
Return_type
is a switch to select the return output (1 = fitted values (default), 2 = residuals, 3 = std. residuals, 4 = X (cleaned), 5 = Y (cleaned)).
Method Description
1 Fitted/Conditional Mean
2 Residuals
3 Standardized (aka Studentized) Residuals
4 Leverage (H)
5 Cook's Distance (D)

Remarks

  1. The underlying model is described here.
  2. The regression fitted (aka estimated) conditional mean is calculated as follows:
    $$\hat y_i = E \left[ Y| x_i \right] = \alpha + \hat \beta \times x_i$$
    Residuals are defined as follows:
    $$ e_i = y_i - \hat y_i $$
    The standardized (aka studentized) residuals are calculated as follows:
    $$\bar e_i = \frac{e_i}{\hat \sigma_i} $$
    Where:
    • $\hat y $ is the estimated regression value.
    • $e $ is the error term in the regression.
    • $\hat e $ is the standardized error term.
    • $\hat \sigma_i $ is the standard error for the i-th observation.
  3. For the influential data analysis, SLR_FITTED computes two values: leverage statistics and Cook's distance for observations in our sample data.
  4. Leverage statistics describe the influence that each observed value has on the fitted value for that same observation. By definition, the diagonal elements of the hat matrix are the leverages.
    $$H = X \left(X^\top X \right)^{-1} X^\top$$
    $$L_i = h_{ii}$$
    Where:
    • $H$ is the Hat matrix for uncorrelated error terms.
    • $\mathbf{X}$ is a (N x 2) matrix of explanatory variables where the first column is all ones.
    • $L_i$ is the leverage statistics for the i-th observation.
    • $h_{ii}$ is the i-th diagonal element in the hat matrix.
  5. Cook's distance measures the effect of deleting a given observation. Data points with large residuals (outliers) and/or high leverage may distort the outcome and accuracy of a regression. Points with a large Cook's distance merit closer examination in the analysis.
    $$D_i = \frac{e_i^2}{p \ \mathrm{MSE}}\left[\frac{h_{ii}}{(1-h_{ii})^2}\right]$$
    Where:
    • $D_i$ is the Cook's distance for the i-th observation.
    • $h_{ii}$ is the leverage statistics (or the i-th diagonal element in the hat matrix).
    • $\mathrm{MSE}$ is the mean square error of the regression model.
    • $p$ is the number of explanatory variables (in SLP case, p=1).
    • $e_i$ is the error term (residual) for the i-th observation.
  6. Different opinions exist regarding what cut-off values to use for spotting highly influential points. Still, generally speaking, a value above $\frac{1}{N}$ is a good candidate for exerting influence on the regression, and further investigation may be in order.
  7. The sample data may include data points with missing values.
  8. Each column in the input matrix corresponds to a separate variable.
  9. Each row in the input matrix corresponds to an observation.
  10. Observations (i.e., row) with missing values in X or Y are removed.
  11. The number of rows of the response variable (Y) must equal the number of rows of the explanatory variables (X).
  12. The SLR_FITTED function is available starting with version 1.60 APACHE.

Files Examples

Related Links

References

  • Hamilton, J .D.; Time Series Analysis, Princeton University Press (1994), ISBN 0-691-04289-6
  • Kenney, J. F. and Keeping, E. S. (1962) "Linear Regression and Correlation." Ch. 15 in Mathematics of Statistics, Pt. 1, 3rd ed. Princeton, NJ: Van Nostrand, pp. 252-285

Comments

Article is closed for comments.

Was this article helpful?
0 out of 0 found this helpful