Returns a list of the selected variables after performing the stepwise regression.
MLR_STEPWISE(X, Mask, Y, Intercept, Method, Alpha)
- is the independent (explanatory) variables data matrix, such that each column represents one variable.
- is the boolean array to choose the explanatory variables in the model. If missing, all variables in X are included.
- is the response or the dependent variable data array (one dimensional array of cells (e.g. rows or columns)).
- is the constant or the intercept value to fix (e.g. zero). If missing, an intercept will not be fixed and is computed normally.
- is a switch to select the variable's inclusion/exclusion approach (1 = forward selection (default), 2 = backward elimination , 3 = bi-directional elimination)
Method Description 1 Forward selection 2 Backward elimination 3 Bi-direction elimination
- is the statistical significance of the inclusion/exclusion test (i.e. alpha). If missing or omitted, an alpha value of 5% is assumed.
- The underlying model is described here.
- The stepwise regression includes regression models in which the choice of predictive variables is carried out by an automatic procedure. The procedure takes the form of a sequence of f-tests in selecting or eliminating explanatory variables.
- The three main approaches are:
- Forward Selection which involves starting with no variables in the model, testing the addition of each variable using a chosen model comparison criterion, adding the variable (if any) that improves the model the most, and repeating this process until no additional variables improve the model.
- Backward Elimination which involves starting with all candidate variables, testing the deletion of each variable using a chosen model comparison criterion, deleting the variable (if any) that improves the model the most by being deleted, and repeating this process until no further improvement is possible.
- Bidirectional Elimination a combination of the above tests, involves testing at each step for variables to be included or excluded.
- One of the main issues with stepwise regression is that it searches a large space of possible models. Hence it is prone to overfitting the data.
- The initial values in the mask array define the variables set that MLR_STEPWISE works with. In other words, variables which are not selected will not be considered during the regression.
- The sample data may include missing values.
- Each column in the input matrix corresponds to a separate variable.
- Each row in the input matrix corresponds to an observation.
- Observations (i.e. rows) with missing values in X or Y are removed.
- The number of rows of the response variable (Y) must be equal to the number of rows of the explanatory variable (X).
- The MLR_STEPWISE function is available starting with version 1.60 APACHE.
- Hamilton, J .D.; Time Series Analysis , Princeton University Press (1994), ISBN 0-691-04289-6
- Kenney, J. F. and Keeping, E. S. (1962) "Linear Regression and Correlation." Ch. 15 in Mathematics of Statistics, Pt. 1, 3rd ed. Princeton, NJ: Van Nostrand, pp. 252-285