MLR_STEPWISE - Regression Variables Selection Method (Stepwise)

Returns a list of the selected variables after performing the stepwise regression.

Syntax

MLR_STEPWISE (X, Mask, Y, Intercept, Method, Alpha)

X
is the independent (explanatory) variables data matrix, so each column represents one variable.
Mask
is the boolean array to choose the explanatory variables in the model. If missing, all variables in X are included.
Y
is the response or the dependent variable data array (a one-dimensional array of cells (e.g., rows or columns)).
Intercept
is the constant or the intercept value to fix (e.g., zero). If missing, an intercept will not be fixed and is usually computed.
Method
is a switch to select the variable's inclusion/exclusion approach (1 = forward selection (default), 2 = backward elimination, 3 = bi-directional elimination)
Value Method
1 Forward selection (default).
2 Backward elimination.
3 Bi-direction elimination.
Alpha
is the statistical significance of the inclusion/exclusion test (i.e., alpha). If missing or omitted, an alpha value of 5% is assumed.

Remarks

  1. The underlying model is described here.
  2. Stepwise regression includes regression models in which an automatic procedure carries out the choice of predictive variables. The procedure takes the form of a sequence of f-tests in selecting or eliminating explanatory variables.
  3. The three main approaches are:
    • Forward Selection involves starting with no variables in the model, testing the addition of each variable using a chosen model comparison criterion, adding the variable (if any) that improves the model the most, and repeating this process until no additional variables improve the model.
    • Backward Elimination involves starting with all candidate variables, testing the deletion of each variable using a chosen model comparison criterion, deleting the variable (if any) that improves the model the most by being deleted, and repeating this process until no further improvement is possible.
    • Bidirectional Elimination, a combination of the above tests, involves testing at each step for variables to be included or excluded.
  4. One of the main issues with stepwise regression is that it searches an ample space of possible models. Hence it is prone to overfitting the data.
  5. The initial values in the mask array define the variables set that MLR_STEPWISE works with. In other words, variables that are not selected will not be considered during the regression.
  6. The sample data may include data points with missing values.
  7. Each column in the input matrix corresponds to a separate variable.
  8. Each row in the input matrix corresponds to an observation.
  9. Observations (i.e., rows) with missing values in X or Y are removed.
  10. The number of rows of the response variable (Y) must equal the number of rows of the explanatory variable (X).
  11. The MLR_STEPWISE function is available starting with version 1.60 APACHE.

Files Examples

Related Links

References

  • Hamilton, J.D.; Time Series Analysis, Princeton University Press (1994), ISBN 0-691-04289-6.
  • Kenney, J. F. and Keeping, E. S. (1962) "Linear Regression and Correlation." Ch. 15 in Mathematics of Statistics, Pt. 1, 3rd ed. Princeton, NJ: Van Nostrand, pp. 252-285.

Comments

Article is closed for comments.

Was this article helpful?
0 out of 0 found this helpful