Goodness of Fit Tutorial

Mohamad

January 17, 2013 16:14

In this tutorial, we'll discuss the different goodness-of-fit functions through an example of the monthly average of ozone levels in Los Angeles between 1955 and 1972 using NumXL.

Video script

Scene 1:

Welcome to the goodness-of-fit mini tutorial. In this video, I will use NumXL to help you better understand the goodness-of-fit functions and to develop some intuition behind them. We will also draw connections between the goodness of fit and the normality tests.

For sample data, I've chosen the monthly averages of the hourly measurements of the ozone level in downtown LA between January 1955 and December 1972. This is a familiar sample that was first analyzed by box, Jenkins, and Ryne cells in their time-series textbook, Time Series Forecasts, and Control from 1976.

To start I've plotted the dates and the average ozone levels in blue.

First, let's examine the summary statistics of our time series sample. For that, we need to select the NumXL tab in the toolbar, from there select the descriptive statistics icon.

Scene 2:

Using the descriptive stats wizard, first locate and select the monthly average ozone levels.

Scene 3:

The currently selected cell is chosen by default as the output cell. Let's leave this unchanged, take all the form defaults, and click OK.

Scene 4:

The descriptive stats table is displayed in your worksheet. Notice that the data has a positive skew or the distribution is not symmetric around the mean.

Let's go ahead and examine for order dependency between the observations. Click the correlogram icon in the NumXL toolbar.

Scene 5:

First, locate and select the monthly average ozone levels.

Scene 6:

Now set the max lags to 24 for ACF and PACF. Leave everything else set at the default and click OK.

Scene 7:

The ACF and PACF table and plots are displayed in your worksheet now. The correlogram looks very similar to the airline models for the period of 12 months. Assuming ozone is produced by car engines then traffic for the average number of cars in downtown LA will follow a similar pattern.

Let's reformat the plot now.

Scene 8:

Here we propose an airline model with seasonality of 12 months for our sample data. Click on or select the cell where you want the model displayed then click the airline model icon in the NumXL toolbar.

Scene 9:

Again select the input sample data and change the length of seasonality to 12. Click OK to finish.

Scene 10:

Now the airline model table is displayed. Select the cell that shows airline 12 and click on the calibration icon in the NumXL toolbar.

Scene 11:

The model Excel solver is displayed with our model input set.

Scene 12:

The objective or utility is set to the cell with the log-likelihood function value. The solver will maximize this value by changing the variable cells. The variable cells are set to the cells that possess the model's parameters, mu, sigma, beta 1, and beta 2.

Scene 13:

The constraint references a cell with a model check function output, this assures us that only valid models are considered.

Scene 14:

Click solve to start the maximization process. The optimal values are now saved in our model table. The LLF, AIC, and all residual diagnosis functions are updated since they are connected to the model's parameters and in formulas.

Scene 15:

Let's go ahead and calculate the model's standardized residuals. Using the NumXL airline residuals function type the function name and click on the Excel function wizard button to the left of the formula bar.

Scene 16:

Again locate the input data and enter the values of the model's parameter.

Scene 17:

The function returns number not applicable but this is only the first element in the array. To get the full array select the remaining cells in the column. Type F2 then press shift ctrl-enter.

The array is now available, please note the curly braces around the function.

Scene 18:

Now let's run descriptive stats on the standardized residuals. Select the standardized residuals as the input cell of the descriptive stats.

Scene 19:

The descriptive stats table is displayed. Notice that the residuals are not skewed but they do possess fat tails. Let's compute the different goodness of fit functions manually. Insert a column to hold our intermediate calculations using Excel's built-in functions. Let's compute each point's log-likelihood value.

Let's prepare the inputs for our manual calculation where n is the number of non-missing values and p is the number of free parameters of the model.

Scene 20:

Now for n calculate the number of non-missing points.

Scene 21:

For P we have four free parameters mu, sigma, theta 1, and theta 2. The LLF is simply the sum of the log-likelihood values of each point. Notice that this value is different than the ones played in the model, simply because airline AIC uses whittles approximation which is close but more efficient for a larger set.

Now let's compute the AIC and BIC using the LLF value above.

And that brings us to the end of this tutorial, thank you for watching. If you have any questions, suggestions, or comments please send them to us at support@numxl.com.

Related articles

Comments