MLR_PRFTest - Partial f-test for Regression Variables

Calculates the p-value and related statistics of the partial f-test (used for testing the inclusion/exclusion variables).

Syntax

X is the independent (explanatory) variables data matrix, such that each column represents one variable.

Y is the response or the dependent variable data array (one dimensional array of cells (e.g. rows or columns)).

Intercept is the constant or the intercept value to fix (e.g. zero). If missing, an intercept will not be fixed and is computed normally.

Mask1 is the boolean array for the explanatory variables in the first model. If missing, all variables in X are included.

Mask2 is the boolean array for the explanatory variables in the second model. If missing, all variables in X are included.

Return_type is a switch to select the return output (1 = P-value (default), 2 = test stats, 3 = critical value.)

Method Description
1 P-value
2 Test statistics (e.g. Z-score)
3 Critical value

Alpha is the statistical significance of the test (i.e. alpha). If missing or omitted, an alpha value of 5% is assumed.

Remarks

1. The underlying model is described here.
2. Model 1 must be a sub-model of Model 2. In other words, all variables included in Model 1 must be included in Model 2.
3. The coefficient of determination (i.e. $R^2$) increases in value as we add variables to the regression model, but we often wish to test whether the improvement in R-square by adding those variables is statistically significant.
4. To do so, we developed an inclusion/exclusion test for those variables. First, let's start with a regression model with $K_1$ variables:

$$Y_t = \alpha + \beta_1 \times X_1 + \cdots + \beta_{K_1} \times X_{K_1}$$

Now, let's add a few more variables $\left(X_{K_1+1} \cdots X_{K_2}\right)$:

$$Y_t = \alpha + \beta_1 \times X_1 + \cdots + \beta_{K_1} \times X_{K_1} + \cdots + \beta_{K_1+1} \times X_{K_1+1} + \cdots + \beta_{K_2} \times X_{K_2}$$
5. The test of the hypothesis is as follows:

$$H_o : \beta_{K_1+1} = \beta_{K_1+2} = \cdots = beta_{K_2} = 0$$

$$H_1 : \exists \beta_{i} \neq 0, i \in \left[K_1+1 \cdots K_2\right]$$
6. Using the change in the coefficient of determination (i.e. $R^2$) as we add new variables, we can calculate the test statistics:

$$\mathrm{f}=\frac{(R^2_{f}-R^2_{r})/(K_2-K_1)}{(1-R^2_f)/(N-K_2-1)}\sim \mathrm{F}_{K_2-K_1,N-K2-1}$$

Where:
• $R^2_f$ is the $R^2$ of the full model (with added variables).
• $R^2_r$ is the $R^2$ of the reduced model (without the added variables).
• $K_1$ is the number of variables in the reduced model.
• $K_2$ is the number of variables in the full model.
• $N$ is the number of observations in the sample data.
7. The sample data may include missing values.
8. Each column in the input matrix corresponds to a separate variable.
9. Each row in the input matrix corresponds to an observation.
10. Observations (i.e. rows) with missing values in X or Y are removed.
11. The number of rows of the response variable (Y) must be equal to the number of rows of the explanatory variable (X).
12. The MLR_ANOVA function is available starting with version 1.60 APACHE.

References

• Hamilton, J .D.; Time Series Analysis , Princeton University Press (1994), ISBN 0-691-04289-6
• Kenney, J. F. and Keeping, E. S. (1962) "Linear Regression and Correlation." Ch. 15 in Mathematics of Statistics, Pt. 1, 3rd ed. Princeton, NJ: Van Nostrand, pp. 252-285