PCR_ANOVA - Analysis of variance for PCR Model

Calculates the regression model (of principal components) analysis of variance (ANOVA) values.

Syntax

PCR_ANOVA (X, Mask, Y, Intercept, Return)

X
is the independent variables data matrix, so each column represents one variable.
Mask
is the boolean array to choose the explanatory variables in the model. If missing, all variables in X are included.
Y
is the response or the dependent variable data array (a one-dimensional array of cells (e.g., rows or columns)).
Intercept
is the constant or the intercept value to fix (e.g., zero). If missing, an intercept will not be fixed and is computed normally.
Return
is a switch to select the return output (1 = SSR (default), 2 = SSE, 3 = SST, 4 = MSR, 5 = MSE, 6 = F-Stat, 7 = P-Value).
Value Return
1 SSR (default).
2 SSE.
3 SST.
4 MSR.
5 MSE.
6 F-Stat.
7 P-Value.

Remarks

  1. The underlying model is described here.
  2. $\mathbf{y} = \alpha + \beta_1 \times \mathbf{PC}_1 + \dots + \beta_p \times \mathbf{PC}_p$
  3. The regression ANOVA table examines the following hypothesis: $$\mathbf{H}_o: \beta_1 = \beta_2 = \dots = \beta_p = 0$$ $$\mathbf{H}_1: \exists \beta_i \neq 0, i \in \left[1,0 \right ]$$
  4. In other words, the regression ANOVA examines the probability that the regression does NOT explain the variation in $\mathbf{y}$, i.e., that any fit is due purely to chance.
  5. The MLR_ANOVA calculates the different values in the ANOVA tables as follows: $$\mathbf{SST}=\sum_{i=1}^N \left(Y_i - \bar Y \right )^2$$ $$\mathbf{SSR}=\sum_{i=1}^N \left(\hat Y_i - \bar Y \right )^2$$ $$\mathbf{SSE}=\sum_{i=1}^N \left(Y_i - \hat Y_i \right )^2$$ Where:
    • $\mathbf{PC}$ is the principal component.
    • $N$ is the number of non-missing observations in the sample data.
    • $\bar Y$ is the empirical sample average for the dependent variable.
    • $\hat Y_i$ is the regression model estimate value for the i-th observation.
    • $\mathbf{SST}$ is the total sum of squares for the dependent variable.
    • $\mathbf{SSR}$ is the total sum of squares for the regression (i.e., $\hat y$) estimate.
    • $\mathbf{SSE}$ is the total sum of error (aka residuals $\epsilon$) terms for the regression (i.e., $\epsilon = y - \hat y$) estimate.
    • $\mathbf{SST} = \mathbf{SSR} + \mathbf{SSE}$
    And, $$\mathbf{MSR} = \frac{\mathbf{SSR} }{p} $$ $$\mathbf{MSE} = \frac{ \mathbf{SSE} }{N-p-1}$$ $$\mathbf{F-Stat} = \frac{\mathbf{MSR} }{\mathbf{MSE} }$$ Where:
    • $p$ is the number of explanatory (aka predictor) variables in the regression.
    • $\mathbf{MSR}$ is the mean squares of the regression.
    • $\mathbf{MSE}$ is the mean squares of the residuals.
    • $\textrm{F-Stat}$ is the test score of the hypothesis.
    • $\textrm{F-Stat} \sim \mathbf{F}\left(p,N-p-1 \right)$.
  6. The sample data may include data points with missing values.
  7. Each column in the input matrix corresponds to a separate variable.
  8. Each row in the input matrix corresponds to an observation.
  9. Observations (i.e., rows) with missing values in X or Y are removed.
  10. The number of rows of the response variable (Y) must equal the number of rows of the explanatory variable (X).
  11. The MLR_ANOVA function is available starting with version 1.60 APACHE.

Files Examples

Related Links

References

  • Hamilton, J.D.; Time Series Analysis, Princeton University Press (1994), ISBN 0-691-04289-6.
  • Kenney, J. F. and Keeping, E. S. (1962) "Linear Regression and Correlation." Ch. 15 in Mathematics of Statistics, Pt. 1, 3rd ed. Princeton, NJ: Van Nostrand, pp. 252-285.

Comments

Article is closed for comments.

Was this article helpful?
0 out of 0 found this helpful