Returns a list of the selected variables after performing the stepwise regression.
X is the independent (explanatory) variables data matrix, such that each column represents one variable.
Mask is the boolean array to choose the explanatory variables in the model. If missing, all variables in X are included.
Y is the response or the dependent variable data array (one dimensional array of cells (e.g. rows or columns)).
Intercept is the constant or the intercept value to fix (e.g. zero). If missing, an intercept will not be fixed and is computed normally.
Method is a switch to select the variable's inclusion/exclusion approach (1 = forward selection (default), 2 = backward elimination , 3 = bi-directional elimination)
Alpha is the statistical significance of the inclusion/exclusion test (i.e. alpha). If missing or omitted, an alpha value of 5% is assumed.
- The underlying model is described here.
- The stepwise regression includes regression models in which the choice of predictive variables is carried out by an automatic procedure. The procedure takes the form of a sequence of f-tests in selecting or eliminating explanatory variables.
- The three main approaches are:
- Forward Selection which involves starting with no variables in the model, testing the addition of each variable using a chosen model comparison criterion, adding the variable (if any) that improves the model the most, and repeating this process until no additional variables improve the model.
- Backward Elimination which involves starting with all candidate variables, testing the deletion of each variable using a chosen model comparison criterion, deleting the variable (if any) that improves the model the most by being deleted, and repeating this process until no further improvement is possible.
- Bidirectional Elimination a combination of the above tests, involves testing at each step for variables to be included or excluded.
- One of the main issues with stepwise regression is that it searches a large space of possible models. Hence it is prone to overfitting the data.
- The initial values in the mask array define the variables set that MLR_STEPWISE works with. In other words, variables which are not selected will not be considered during the regression.
- The sample data may include missing values.
- Each column in the input matrix corresponds to a separate variable.
- Each row in the input matrix corresponds to an observation.
- Observations (i.e. rows) with missing values in X or Y are removed.
- The number of rows of the response variable (Y) must be equal to the number of rows of the explanatory variable (X).
- The MLR_STEPWISE function is available starting with version 1.60 APACHE.
- Hamilton, J .D.; Time Series Analysis , Princeton University Press (1994), ISBN 0-691-04289-6
- Kenney, J. F. and Keeping, E. S. (1962) "Linear Regression and Correlation." Ch. 15 in Mathematics of Statistics, Pt. 1, 3rd ed. Princeton, NJ: Van Nostrand, pp. 252-285