Calculates the regression model (of principal components) analysis of variance (ANOVA) values.
Syntax
PCR_ANOVA(X, Mask, Y, Intercept, Return_type)
- X
- is the independent variables data matrix, such that each column represents one variable.
- Mask
- is the boolean array to choose the explanatory variables in the model. If missing, all variables in X are included.
- Y
- is the response or the dependent variable data array (one dimensional array of cells (e.g. rows or columns)).
- Intercept
- is the constant or the intercept value to fix (e.g. zero). If missing, an intercept will not be fixed and is computed normally.
- Return_type
- is a switch to select the return output (1 = SSR (default), 2 = SSE, 3 = SST, ... (see help file)).
Method Description 1 SSR 2 SSE 3 SST 4 MSR 5 MSE 6 F-Stat 7 P-Value
Remarks
- The underlying model is described here.
- $$\mathbf{y} = \alpha + \beta_1 \times \mathbf{PC}_1 + \dots + \beta_p \times \mathbf{PC}_p$$
- The regression ANOVA table examines the following hypothesis:
$$\mathbf{H}_o: \beta_1 = \beta_2 = \dots = \beta_p = 0 $$
$$\mathbf{H}_1: \exists \beta_i \neq 0, i \in \left[1,0 \right ] $$ - In other words, the regression ANOVA examines the probability that the regression does NOT explain the variation in $\mathbf{y}$, i.e. that any fit is due purely to chance.
- The MLR_ANOVA calculates the different values in the ANOVA tables as follows:
$$\mathbf{SST}=\sum_{i=1}^N \left(Y_i - \bar Y \right )^2 $$
$$\mathbf{SSR}=\sum_{i=1}^N \left(\hat Y_i - \bar Y \right )^2 $$
$$\mathbf{SSR}=\sum_{i=1}^N \left(Y_i - \hat Y_i \right )^2 $$
Where:- $\mathbf{PC}$ is the principal component.
- $N$ is the number of non-missing observations in the sample data.
- $\bar Y$ is the empirical sample average for the dependent variable.
- $\hat Y_i$ is the regression model estimate value for the i-th observation.
- $\mathbf{SST}$ is the total sum of squares for the dependent variable.
- $\mathbf{SSR}$ is the total sum of squares for the regression (i.e. $\hat y$) estimate.
- $\mathbf{SSE}$ is the total sum of error (aka residuals $\epsilon$) terms for the regression (i.e. $\epsilon = y - \hat y$) estimate.
- $\mathbf{SST} = \mathbf{SSR} + \mathbf{SSE}$
$$\mathbf{MSR} = \frac{\mathbf{SSR} }{p} $$
$$\mathbf{MSE} = \frac{ \mathbf{SSE} }{N-p-1}$$
$$\mathbf{F-Stat} = \frac{\mathbf{MSR} }{\mathbf{MSE} }$$
Where:- $p$ is the number of explanatory (aka predictor) variables in the regression.
- $\mathbf{MSR}$ is the mean squares of the regression.
- $\mathbf{MSE}$ is the mean squares of the residuals.
- $\textrm{F-Stat}$ is the test score of the hypothesis.
$\textrm{F-Stat} \sim \mathbf{F}\left(p,N-p-1 \right)$
- The sample data may include missing values.
- Each column in the input matrix corresponds to a separate variable.
- Each row in the input matrix corresponds to an observation.
- Observations (i.e. rows) with missing values in X or Y are removed.
- The number of rows of the response variable (Y) must be equal to the number of rows of the explanatory variable (X).
- The MLR_ANOVA function is available starting with version 1.60 APACHE.
Files Examples
Related Links
References
- Hamilton, J .D.; Time Series Analysis , Princeton University Press (1994), ISBN 0-691-04289-6
- Kenney, J. F. and Keeping, E. S. (1962) "Linear Regression and Correlation." Ch. 15 in Mathematics of Statistics, Pt. 1, 3rd ed. Princeton, NJ: Van Nostrand, pp. 252-285
Comments
Article is closed for comments.