In this module, we will demonstrate the steps to construct a regression model, examine its statistical significance, residuals diagnosis and the coefficients values.
Hello and welcome to our first session in regression analysis tutorial series. In this video we will show you how to use the NumXL program to construct a regression model.
We will use a sample data set gathered from 20 different sales persons: weekly sales (dependent variable) while two explanatory variables are intelligence (IQ) and extroversion. Note how we organize our input data; values of each variable are placed in a separate columns, and each observation (or salesman) is represented by a separate row.
Select an empty cell in your worksheet where you wish the output to be generated, then click on the regression icon in the NumXL tab (or toolbar).
The Regression Wizard pops up. By default, the output cells range is set to the current selected cell in your worksheet. Select the cells range for the response/dependent variable values (i.e. weekly sales) then select the cells range for the explanatory (independent) variables values. Note the cells range can include (optional) the column heading (Label), which would be used in the output tables. Leave the "Variable Mask" field blank for now. We will revisit this field in other videos. Finally, once we select the X and Y cells range, the "options", "Forecast" and "Missing Values" tabs will become available.
Initially, the options tab is set to the following values: First, the regression intercept/constant is left blank. This indicates that the regression intercept will be estimated by the regression. If you want to set the regression to a fixed value, enter it here.
The significance level (aka. alpha) is set to 5%.
In the output section, the most common regression analysis is selected. Lastly let's leave it unchecked. We will discuss this functionality in a later video. Now, click on the "Missing Values" tab.
In this tab, you can select the approach to handle missing values in the data set (X and Y). By default, any missing value found in X or in Y in any observation would be excluded from the analysis. This treatment is a good approach for our analysis, so let's leave it unchanged. Now, click OK to generate the output tables.
Let's now examine the different output tables more closely.
In the regression statistics table, a number of summary statistics for the goodness-of-fit of the regression model is displayed. In our example, the adjusted R square is 28%. Before we can seriously consider the regression model, we must answer the following question: "Is the regression model statistically significant or a statistical data anomaly?" The analysis of variance (ANOVA) table answers this question.
In the first row of the table (i.e. "Regression"), we compute the test-score (F-Stat) and P-Value, then compare them against the significance level (alpha). In our case, the regression model is statistically valid, and it does explain some of the variation in values of the dependent variable (weekly sales).
The remaining calculations in the table are simply to help us to get to this point. In our example, the regression is proved to be significant. Once we confirm that the regression model explains some of the variation in the values of the response variable (weekly sales), we can examine the residuals to make sure that the underlying model's assumptions are met.
Using the standardized residuals, we perform a series of statistical tests for the mean, variance, skew, excess kurtosis and the normality assumption. In this example, the standardized residuals passed all our test. Once we establish that the regression model is significant, we can look closer at the regression coefficients.
Each coefficient (including the intercept) is shown on a separate row, and we compute the following statistics: Mean Value, Standard error, Test score (T-stat) for significance test hypothesis, p-value and the upper and lower limit for the confidence interval, and accept/reject decision for the hypothesis. In our example, only the "extroversion" variable is found significant while the intercept and the "Intelligence" were not found significant.
That is it for now, thank you for watching!