Starting with version 1.57, NumXL will support U.S. Census X12-ARIMA modeling including seasonal adjustment, trend filtering, and model identification and forecasting.
In this paper, we will go over the approach followed by NumXL to implement this model.
The approach of this feature is to use the US Census free program (aka x12a.exe), which provides users with a complete excel interface as well as raw input and output files for advanced users.
The US Census X12-ARIMA program possesses no user interface. To invoke it, the user needs to write a script input file (aka specification) along with the data and invoke the program from the command-line interface. Once complete, the program generates various output messages and files in designated folders.
To put a new Excel face on this widely-used legacy program, NumXL provides a wizard or dialog box that users can use to specify the cells' range of their data, select various modeling options, save those settings as part of their Excel spreadsheet and query the different outputs of their model.
Behind the scenes, NumXL transforms the user’s data and model selections into a native x12a specification file. Finally, NumXL runs the x12a program, reads the output files, and avails the results to the user in excel.
The whole process (i.e. preparing the specification file, running the program, and reading output files) is hidden from the user, but to promote transparency, the NumXL X12-ARIMA wizard allows access to different input/output files (e.g. specification file, error file, and output file).
During the NumXL installation, the installer program copies the x12A program (32/64-bit version) and all support files into your computer under the NumXL home path. The user is not required to download it from the US Census website, and in the event that the user already has this program, NumXL uses the program that comes with the installer to avoid any version mismatch issues.
Similar to what we did in our earlier tutorial, we organize our sample data by placing the dates in one column and the variable values in a separate column, with each observation in a separate row.
In the example above, we used the real quarterly GDP data for the US, France, Australia, and Canada.
- The different time series do not start on the same date. We replace the missing values with #N/A.
- The different time series may not finish on the same date.
- They are all quarterly data. They do not mix monthly and quarterly data.
Furthermore, users may add future dates to the end of the time series and fill it in with “#N/A” for their future values (see the following figure).
NumXL removes the missing values from either end of the times series and adjusts the series start date, so it is not an issue. For the user, the new row serves as a placeholder for future observations; so as new data become available (published), the user replaces the missing value (i.e. #N/A) with the actual values, triggering the models' formulas to re-evaluate, without the need to edit anything in the spreadsheet.
The X12A program has a few hard limits on the size of the time series:
- The maximum length of a time series is 600 observations.
- The minimum length of the monthly time series is 3 years (36 observations)
- The minimum length of the quarterly time series is 4 years (16 observations)
To accommodate those limits for larger time series, NumXL picks the most recent 600 observations and adjusts the start date of the series accordingly.
First, select an empty cell in your worksheet to store the unique identifier or your X12-ARIMA in Excel.
Next, Locate the X12 ARIMA icon in the toolbar (or menu in Excel 2003) and click on it.
The X12 ARIMA Wizard (dialog box) in Excel appears.
For the input time series data, select the cells range for the values, start date, and the frequency of the observations (i.e. monthly or quarterly).
- The values of the selected cells range may contain missing values (#N/A) at either end.
- The time series may not contain any intermediate missing values. If your series has one or more intermediate missing value(s), substitute a fill-in value using interpolation or any method you are comfortable with.
- The start date is a valid Excel date (e.g. 1/1/1947) even for quarterly data. Don’t use other formats similar to 1947.Q1 or 1947.3, as those are not valid dates in Excel.
- The start date must correspond to the first observation in the time series regardless of whether the observation’s value is missing or not.
Next, let’s set the prior adjustment of our input data:
This section allows you to set a special data treatment prior to the modeling process. For instance, “Transform” instructs the X12a program to model the log-values of our time series.
In the regression section, the user can adjust for special calendar effects such as trading days and holidays like Easter. For more details on calendar effects and adjustment, refer to our document online:
- Calendar Effects
IMPORTANT: The Easter holiday effect commences N days before Easter. Currently, NumXL uses 14 days prior to Easter. This will be changed in future releases to permit the user to select a value.
In the outlier type section, the user can select which types of outliers to detect and adjust for. For more details on those types of outliers, refer to our document online:
- Data preparation - Outliers
Now, let’s examine the ARIMA model section:
The X12-ARIMA methodology (regARIMA) uses a seasonal ARIMA (SARIMA) model to capture both the seasonality (deterministic) and the (stochastic) cyclicity in the data.
The user may elect for the program to find the best fit model (“Auto-Select”) or they can specify the order of the model.
For more information about time series decomposition and/or seasonal adjustment, refer to our online document.
- Patterns Unplugged
In the “Forecast” section, we can select the duration of the forecast. It is set to one (1) year by default, but users can select a higher forecast horizon up to seven (7) years (hard limit).
Now, let’s set the seasonal adjustment values:
By default, the X11 seasonal adjustment option is selected. The X11 filter is derived from Henderson's trend filters (Robert Henderson 1916).
In X11 Mode, the user can control the type of seasonal adjustment decomposition calculated (mode): multiplicative, additive, pseudo-additive, or log additives.
Using the X11 filter option, the user can control the seasonal moving average used. Currently, the trend moving average is set to 13.
- The Trading Day effect and other holiday adjustments in X11 are not yet available in NumXL.
- The Extreme value adjustment control is enabled and set to sigma limits of 1.25 and 2.75
For more information about time series decomposition and/or seasonal adjustment, refer to our online document Patterns Unplugged.
Now that we’ve finished specifying the model options, click “Apply.”
- The “Open X-12 SPC file” button becomes enabled.
- If you click on “Open X-12 SPC file,” the Windows Notepad application will launch with the x12a specification file opened.
- In the selected cell in your worksheet, the X12-ARIMA generates a unique identifier for the model.
- The “Run X12A” button is enabled now.
Finally, let’s run the x12a program. Click the “Run X-12A” button. NumXL invokes the program and passes the specification file generated earlier. Upon completion, a dialog box pops up.
Note that all command buttons on the upper right corner of the dialog are now enabled.
Let’s examine the status (i.e. warnings or errors) produced by the x12a program execution. Click “Open X-12 Error File” to view the file.
Again, the Notepad application is launched and the error file generated by the x12a program is displayed.
- The error file (x12a_34cc1761.err) has the same base file name as the specification file name (i.e. x12a_34cc1761.spc), which is the unique identifier of the x12-arima model (x12a_34cc1761).
- In the case of US real GDP, the x12a did not detect any significant seasonality, thus it throws a warning.
- The second paragraph of the warning above is not relevant to our case, as we model the gross GDP time series (rather than its GDP components (e.g. consumption, investment, government spending, and import/export)).
Optionally, to may examine the raw x12a output. Click on the “Open X12 Output file” button and, again, the Notepad application is launched and the x12 output file is displayed.
- A series of tests for seasonality is performed first. In the case of the US real GDP series, the test did not find any significant seasonality.
- The auto-modeling regARIMA procedure prints out the order (AR & MA) of the selected model. In our case, it is (1 1 1) with no seasonality (i.e. ARIMA(1,1,1)) Now, click “OK” to exit the wizard.
By now you must be wondering, “where are the model’s outputs?” NumXL offers a few worksheet functions to query the model’s different outputs.
To start, let’s query the X11 seasonally adjusted time series. Use the X12ACOMP function for this purpose.
- The first argument references the model’s unique identified in cell B1.
- The second argument references the step from the beginning of the time series, so for C3, the step is equal to one (1).
- The step value ranges between one (1) and the length of the input time series. For our US GDP quarterly data example, the step can be between one (1) and 265.
- The last argument selects the output component. For X11 seasonally adjusted (SA), select one.
- Refer to the X12ACOMP reference manual page for more details.
For forecasting, we use the X12AFORE to query the forecast value and/or confidence interval.
- The first argument references the model’s unique identified in cell B1.
- The second argument references the step from the end of the time series (last non-missing value), so for K268, the step is equal to one (1).
- The step value ranges between one (1) and the forecast horizon. For our US GDP quarterly data example, the step can be between one (1) and four (1 year). After that, the X12AFORE returns the last known forecast value (e.g. Q4).
- For forecast mean value, the 3rd argument is set to one (1).
- Refer to the X12AFORE reference manual page for more details.
In this tutorial, we demonstrated the process to model an X12-ARIMA model and derive an X11 seasonally adjusted time series in Excel using NumXL’s add-in functions.
Throughout the tutorial, we presented several elements of NumXL’s implementation of X12-ARIMA, in an attempt to help you resolve issues that may pop up during the modeling process.
Where do we go from here?
First, to answer the question of optimality, we need to introduce additional algorithms to select the optimal set options and their values (e.g. X11 filter options, X11 mode, etc.) for a given data set.
Second, the set of calendar holidays supported is relatively limited. By combining the calendar functions in NumXL, we can expand the set significantly.
- Furthermore, we are planning to add non-fixed holidays such as Chinese New Year, as well as Islamic and Jewish holidays.
- Full support for user-defined (exogenous) explanatory/regression variables.
Third, many of the economic data can be represented as a sum of their components (e.g. GDP and its components: consumption, investment, government, and import/export net), so modeling the components and their sum requires special handling.
Finally, the US Census will release the X13-ARIMA-SEATS edition of their program, so new filtering options will become available.