Calculates a measure for the goodness of fit (e.g. LLF, AIC, R^2, etc.).

## Syntax

**MLR_GOF**(

**X**,

**Mask**,

**Y**,

**Intercept**,

**Return_type**)

**X** is the independent (explanatory) variables data matrix, such that each column represents one variable.

**Mask** is the boolean array to choose the explanatory variables in the model. If missing, all variables in X are included.

**Y** is the response or the dependent variable data array (one dimensional array of cells (e.g. rows or columns)).

**Intercept** is the constant or the intercept value to fix (e.g. zero). If missing, an intercept will not be fixed and is computed normally.

**Return_type** is a switch to select a fitness measure (1 = R-Square (default), 2 = Adjusted R Square, 3 = RMSE, 4 = LLF, 5 = AIC, 6 = BIC/SIC).

Method | Description |
---|---|

1 | R-square (coefficient of determination) |

2 | Adjusted R-square |

3 | Regression error (RMSE) |

4 | Log-likelyhood (LLF) |

5 | Akaike-information criterion (AIC) |

6 | Schwartz/Bayesian information criterion (SBIC) |

## Remarks

- The underlying model is described here.
- The coefficient of determination, denoted $R^2$, provides a measure of how well observed outcomes are replicated by the model.

$$R^2 = \frac{\mathrm{SSR}} {\mathrm{SST}} = 1 - \frac{\mathrm{SSE}} {\mathrm{SST}}$$ - The adjusted R-square (denoted $\bar R^2$) is an attempt to take account of the phenomenon of the $R^2$ automatically and spuriously increasing when extra explanatory variables are added to the model. The $\bar R^2$ adjusts for the number of explanatory terms in a model relative to the number of data points.

$$\bar R^2 = {1-(1-R^{2}){N-1 \over N-p-1}} = {R^{2}-(1-R^{2}){p \over N-p-1}} = 1 - \frac{\mathrm{SSE}/(N-p-1)}{\mathrm{SST}/(N-1)}$$

Where:- $p$ is the number of explanatory variables in the model.
- $N$ is the number of observations in the sample.

- The regression error is defined as the square root for the mean square error (RMSE):

$$\mathrm{RMSE} = \sqrt{\frac{SSE}{N-p-1}}$$ - The log likelihood of the regression is given as:

$$\mathrm{LLF}=-\frac{N}{2}\left(1+\ln(2\pi)+\ln\left(\frac{\mathrm{SSR}}{N} \right ) \right )$$

The Akaike and Schwarz/Bayesian information criteria are given as:

$$\mathrm{AIC}=-\frac{2\mathrm{LLF}}{N}+\frac{2(p+1)}{N}$$

$$\mathrm{BIC} = \mathrm{SIC}=-\frac{2\mathrm{LLF}}{N}+\frac{(p+1)\times\ln(p+1)}{N}$$ - The sample data may include missing values.
- Each column in the input matrix corresponds to a separate variable.
- Each row in the input matrix corresponds to an observation.
- Observations (i.e. rows) with missing values in X or Y are removed.
- The number of rows of the response variable (Y) must be equal to the number of rows of the explanatory variable (X).
- The MLR_GOF function is available starting with version 1.60 APACHE.

## Files Examples

## References

- Hamilton, J .D.; Time Series Analysis , Princeton University Press (1994), ISBN 0-691-04289-6
- Kenney, J. F. and Keeping, E. S. (1962) "Linear Regression and Correlation." Ch. 15 in Mathematics of Statistics, Pt. 1, 3rd ed. Princeton, NJ: Van Nostrand, pp. 252-285

## 0 Comments