Returns a list of the selected variables after performing the stepwise regression.

## Syntax

**MLR_STEPWISE**(

**X**,

**Mask**,

**Y**,

**Intercept**,

**Method**,

**Alpha**)

**X** is the independent (explanatory) variables data matrix, such that each column represents one variable.

**Mask** is the boolean array to choose the explanatory variables in the model. If missing, all variables in X are included.

**Y** is the response or the dependent variable data array (one dimensional array of cells (e.g. rows or columns)).

**Intercept** is the constant or the intercept value to fix (e.g. zero). If missing, an intercept will not be fixed and is computed normally.

**Method** is a switch to select the variable's inclusion/exclusion approach (1 = forward selection (default), 2 = backward elimination , 3 = bi-directional elimination)

Method | Description |
---|---|

1 | Forward selection |

2 | Backward elimination |

3 | Bi-direction elimination |

**Alpha** is the statistical significance of the inclusion/exclusion test (i.e. alpha). If missing or omitted, an alpha value of 5% is assumed.

## Remarks

- The underlying model is described here.
- The stepwise regression includes regression models in which the choice of predictive variables is carried out by an automatic procedure. The procedure takes the form of a sequence of f-tests in selecting or eliminating explanatory variables.
- The three main approaches are:
**Forward Selection**which involves starting with no variables in the model, testing the addition of each variable using a chosen model comparison criterion, adding the variable (if any) that improves the model the most, and repeating this process until no additional variables improve the model.**Backward Elimination**which involves starting with all candidate variables, testing the deletion of each variable using a chosen model comparison criterion, deleting the variable (if any) that improves the model the most by being deleted, and repeating this process until no further improvement is possible.**Bidirectional Elimination**a combination of the above tests, involves testing at each step for variables to be included or excluded.

- One of the main issues with stepwise regression is that it searches a large space of possible models. Hence it is prone to overfitting the data.
- The initial values in the mask array define the variables set that MLR_STEPWISE works with. In other words, variables which are not selected will not be considered during the regression.
- The sample data may include missing values.
- Each column in the input matrix corresponds to a separate variable.
- Each row in the input matrix corresponds to an observation.
- Observations (i.e. rows) with missing values in X or Y are removed.
- The number of rows of the response variable (Y) must be equal to the number of rows of the explanatory variable (X).
- The MLR_STEPWISE function is available starting with version 1.60 APACHE.

## Files Examples

## References

- Hamilton, J .D.; Time Series Analysis , Princeton University Press (1994), ISBN 0-691-04289-6
- Kenney, J. F. and Keeping, E. S. (1962) "Linear Regression and Correlation." Ch. 15 in Mathematics of Statistics, Pt. 1, 3rd ed. Princeton, NJ: Van Nostrand, pp. 252-285

## 0 Comments