MLR_GOF - Goodness of fit of Linear Regression Model

Calculates a measure for the goodness of fit (e.g., LLF, AIC, R^2, etc.).

Syntax

MLR_GOF (X, Mask, Y, Intercept, Return)

X
is the independent (explanatory) variables data matrix, so each column represents one variable.
Mask
is the boolean array to choose the explanatory variables in the model. If missing, all variables in X are included.
Y
is the response or the dependent variable data array (a one-dimensional array of cells (e.g., rows or columns)).
Intercept
is the constant or the intercept value to fix (e.g., zero). If missing, an intercept will not be fixed and is computed typically.
Return
is a switch to select a fitness measure (1 = R-Square (default), 2 = Adjusted R Square, 3 = RMSE, 4 = LLF, 5 = AIC, 6 = BIC/SIC).
Value Return
1 R-square (coefficient of determination) (default).
2 Adjusted R-square.
3 Regression error (RMSE).
4 Log-likelihood (LLF).
5 Akaike-information criterion (AIC).
6 Schwartz/Bayesian information criterion (SBIC).

Remarks

  1. The underlying model is described here.
  2. The coefficient of determination, denoted $R^2$, measures how well the model replicates observed outcomes. $$R^2 = \frac{\mathrm{SSR}} {\mathrm{SST}} = 1 - \frac{\mathrm{SSE}} {\mathrm{SST}}$$
  3. The adjusted R-square (denoted $\bar R^2$) is an attempt to take account of the phenomenon of the $R^2$ automatically and spuriously increasing when extra explanatory variables are added to the model. The $\bar R^2$ adjusts for the number of explanatory terms in a model relative to the number of data points. $$\bar R^2 = {1-(1-R^{2}){N-1 \over N-p-1}} = {R^{2}-(1-R^{2}){p \over N-p-1}} = 1 - \frac{\mathrm{SSE}/(N-p-1)}{\mathrm{SST}/(N-1)}$$ Where:
    • $p$ is the number of explanatory variables in the model.
    • $N$ is the number of observations in the sample.
  4. The regression error is defined as the square root for the mean square error (RMSE): $$\mathrm{RMSE} = \sqrt{\frac{SSE}{N-p-1}}$$
  5. The log-likelihood of the regression is given as $$\mathrm{LLF}=-\frac{N}{2}\left(1+\ln(2\pi)+\ln\left(\frac{\mathrm{SSR}}{N} \right ) \right )$$ The Akaike and Schwarz/Bayesian information criteria are given as: $$\mathrm{AIC}=-\frac{2\mathrm{LLF}}{N}+\frac{2(p+1)}{N}$$ $$\mathrm{BIC} = \mathrm{SIC}=-\frac{2\mathrm{LLF}}{N}+\frac{(p+1)\times\ln(p+1)}{N}$$
  6. The sample data may include data points with missing values.
  7. Each column in the input matrix corresponds to a separate variable.
  8. Each row in the input matrix corresponds to an observation.
  9. Observations (i.e., rows) with missing values in X or Y are removed.
  10. The number of rows of the response variable (Y) must equal the number of rows of the explanatory variable (X).
  11. The MLR_GOF function is available starting with version 1.60 APACHE.

Files Examples

Related Links

References

  • Hamilton, J.D.; Time Series Analysis, Princeton University Press (1994), ISBN 0-691-04289-6.
  • Kenney, J. F. and Keeping, E. S. (1962) "Linear Regression and Correlation." Ch. 15 in Mathematics of Statistics, Pt. 1, 3rd ed. Princeton, NJ: Van Nostrand, pp. 252-285.

Comments

Article is closed for comments.

Was this article helpful?
0 out of 0 found this helpful