In this tutorial, we'll discuss the different goodness-of-fit functions through an example of the monthly average of ozone levels in Los Angeles between 1955 and 1972 using NumXL.
Welcome to the goodness-of-fit mini tutorial. In this video I will use NumXL to help you better understand the goodness-of-fit functions and to develop some intuition behind them. We will also draw connections between goodness of fit and the normality tests.
For sample data I've chosen the monthly averages of the hourly measurements of the ozone level in downtown LA between January 1955 and December 1972. This is a familiar sample that was first analyzed by box, jenkins and ryne cell in their time series textbook, Time Series Forecasts and Control from 1976.
To start I've plotted the dates and the average ozone levels in blue.
First let's examine the summary statistics of our time series sample. For that we need to select the NumXL tab in the toolbar, from there select the descriptive statistics icon.
Using the descriptive stats wizard, first locate and select the monthly average ozone levels.
The current selected cell is chosen by default as the output cell. Let's leave this unchanged, take all the form defaults and click OK.
The descriptive stats table is displayed in your worksheet. Notice that the data has a positive skew or the distribution is not symmetric around the mean.
Let's go ahead and examine for order dependency between the observations. Click the correlogram icon in the NumXL toolbar.
First locate and select the monthly average ozone levels.
Now set the max lags to 24 for ACF and PACF. Leave everything else set at the default and click OK.
The ACF and PACF table and plots are displayed in your worksheet now. The correlogram looks very similar to the airline models for the period of 12 months. Assuming ozone is produced by car engines than traffic for the average number of cars in downtown LA will follow a similar pattern.
Let's reformat the plot now.
Here we propose an airline model with a seasonality of 12 months for our sample data. Click on or select the cell where you want the model displayed then click the airline model icon in the NumXL tool bar.
Again select the input sample data and change the length of seasonality to 12. Click OK to finish.
Now the airline model table is displayed. Select the cell that shows airline 12 and click on the calibration icon in the NumXL toolbar.
The model Excel solver is displayed with our model input set.
The objective or utility is set to the cell with the log-likelihood function value. The solver will maximize this value by changing the variable cells. The variable cells are set to the cells that possess the models parameters, mu, sigma, beta 1 and beta 2.
The constraint references a cell with a model check function output, this assures us that only valid models are considered.
Click solve to start the maximization process. The optimal values are now saved in our model table. The LLF, AIC and all residual diagnosis functions are updated since they are connected to the models parameters and in formulas.
Let's go ahead and calculate the models standardized residuals. Using the NumXL airline residuals function type the function name and click on the Excel function wizard button to the left of the formula bar.
Again locate the input data and enter the values of the models parameter.
The function returns number not applicable but this is only the first element in the array. To get the full array select the remaining cells in the column. Type F2 then press shift ctrl enter.
The array is now available, please note the curly braces around the function.
Now let's run a descriptive stats on the standardized residuals. Select the standardized residuals as the input cell of the descriptive stats.
The descriptive stats table is displayed notice that the residuals are not skewed but they do possess fat tails. Let's compute the different goodness of fit functions manually. Insert a column to hold our intermediate calculations using excels built-in functions. Let's compute each points log likelihood value.
Let's prepare the inputs for our manual calculation where n is the number of non missing value and p is the number of free parameters of the model.
Now for n calculate the number of non missing points.
For P we have four free parameters mu, sigma, theta 1, and theta 2. The LLF is simply the sum of the log likelihood values of each point. Notice that this value is different than the ones played in the model, simply because airline AIC uses whittles approximation which is close but more efficient for a larger set.
Now let's compute the AIC and BIC using the LLF value above.
And that brings us to the end of this tutorial, thank you for watching. If you have any questions, suggestions or comments please send them to us at firstname.lastname@example.org.