Appendix D: Coefficient of Determination (R-Squared)

The coefficient of determination ($R^2$) is used in the context of statistical models which we wish to use to predict future outcomes. The R^2 is defined as the proportion of the variability in the sample data that is accounted for by the statistical model. The $R^2$ serves as a goodness of fit measure.

For a given data set with observed values ${y_i}$ and an associate model's values $\{ \widehat{y_i} \}$, the variability of the data set is measure as the sum of squared differences.

$$R^2 =\frac{SS_{reg}}{SS_{tot}}=1-\frac{SS_{err}}{SS_{tot}}$$

Where

$$SS_{tot} = \sum_{i=1}^N{(y_i-\overline{y})^2}$$ $$SS_{reg} = \sum_{i=1}^N{(\widehat{y_i}-\overline{y})^2}$$ $$SS_{err} = \sum_{i=1}^N{(y_i - \widehat{y_i})^2}$$
• $\overline{y}$= the sample average of the observed values
• $SS_{tot}$= the total sum of squares
• $SS_{reg}$= the model (e.g. regression) sum of squares
• $SS_{err}$= the sum of the squares of the residuals (residuals sum of squares)
• $N$= number of observations

To factor in the number of explanatory variables in the model, the adjusted $R^2$ (or $\overline{R}^2$) is used as a modification.

$\overline{R}^2$ is defined as follow:

$$\overline{y}^2=1-(1-R^2)\frac{N-1}{N-p-1}=1-\frac{(N-1)SS_{err}}{(N-p-1)SS_{tot}}$$

Where:

• $p$ = number of explanatory variables
• $N$ = number of explanatory variables

Remarks
1. The adjusted $R^2$ is not a test of the model in the sense of hypothesis testing, but can be used as a tool for model selection