PCR_PRFTest - Partial F-test for Principal Components Regression

Calculates the p-value and related statistics of the partial f-test for PCR (used for testing the inclusion/exclusion variables).

Syntax

PCR_PRFTest(X, Y, Intercept, Mask1, Mask2, Return_type, Alpha)
X
is the independent variables data matrix, so each column represents one variable.
Y
is the response or the dependent variable data array (a one-dimensional array of cells (e.g., rows or columns)).
Intercept
is the constant or the intercept value to fix (e.g., zero). If missing, an intercept will not be fixed and is usually computed.
Mask1
is the boolean array for the explanatory variables in the first model. If missing, all variables in X are included.
Mask2
is the boolean array for the explanatory variables in the second model. If missing, all variables in X are included.
Return_type
is a switch to select the return output (1 = P-value (default), 2 = test stats, 3 = critical value.)
Method Description
1 P-value
2 Test statistics (e.g., Z-score)
3 Critical value
Alpha
is the statistical significance of the test (i.e., alpha). If missing or omitted, an alpha value of 5% is assumed.

Remarks

  1. The underlying model is described here.
  2. Model 1 must be a sub-model of Model 2. In other words, all variables included in Model 1 must be included in Model 2.
  3. The coefficient of determination (i.e., $R^2$) increases in value as we add variables to the regression model. Still, we often wish to test whether the improvement in R square by adding those variables is statistically significant.
  4. To do so, we developed an inclusion/exclusion test for those variables. First, let's start with a regression model with $K_1$ variables:

    $$Y_t = \alpha + \beta_1 \times X_1 + \cdots + \beta_{K_1} \times X_{K_1}$$

    Now, let's add few more variables $\left(X_{K_1+1} \cdots X_{K_2}\right):$

    $$Y_t = \alpha + \beta_1 \times X_1 + \cdots + \beta_{K_1} \times X_{K_1} + \cdots + \beta_{K_1+1} \times X_{K_1+1} + \cdots + \beta_{K_2} \times X_{K_2}$$
  5. The test of the hypothesis is as follows:

    $$H_o : \beta_{K_1+1} = \beta_{K_1+2} = \cdots = beta_{K_2} = 0$$

    $$H_1 : \exists \beta_{i} \neq 0, i \in \left[K_1+1 \cdots K_2\right]$$
  6. Using the change in the coefficient of determination (i.e., $R^2$) as we added new variables, we can calculate the test statistics:

    $$\mathrm{f}=\frac{(R^2_{f}-R^2_{r})/(K_2-K_1)}{(1-R^2_f)/(N-K_2-1)}\sim \mathrm{F}_{K_2-K_1,N-K2-1}$$

    Where:
    • $R^2_f$ is the $R^2$ of the full model (with added variables).
    • $R^2_r$ is the $R^2$ of the reduced model (without the added variables).
    • $K_1$ is the number of variables in the reduced model.
    • $K_2$ is the number of variables in the full model.
    • $N$ is the number of observations in the sample data.
  7. The sample data may include data-points with missing values.
  8. Each column in the input matrix corresponds to a separate variable.
  9. Each row in the input matrix corresponds to an observation.
  10. Observations (i.e., rows) with missing values in X or Y are removed.
  11. The number of rows of the response variable (Y) must equal the number of rows of the explanatory variable (X).
  12. The MLR_ANOVA function is available starting with version 1.60 APACHE.

Files Examples

Related Links

References

  • Hamilton, J .D.; Time Series Analysis, Princeton University Press (1994), ISBN 0-691-04289-6
  • Kenney, J. F. and Keeping, E. S. (1962) "Linear Regression and Correlation." Ch. 15 in Mathematics of Statistics, Pt. 1, 3rd ed. Princeton, NJ: Van Nostrand, pp. 252-285

Comments

Article is closed for comments.

Was this article helpful?
0 out of 0 found this helpful