Backtesting in Excel

Fayez

October 21, 2020 16:34

Backtesting in modeling refers to a predictive model's testing using historical data. The article is about how to do so in Microsoft Excel, not about the theoretical background of backtesting.

How do we conduct backtesting? We rewind the time to the beginning of our time series, calibrate the subject-model parameters using available data up to that instance of time, and conduct a prediction (i.e., forecast) for the next period. Next, we advance the time, recalibrate the parameters' values and perform another projection, and so on. At the end of our exercise, we'd have a set of predictions.

Note that at each point, the only assumption we make is the general model definition (e.g., ARMA(1,1)). Still, we'd calibrate the parameters' values using only the information available up to that instance of time.

This approach is consistent with real-life practices: first, we start with an initial model and conduct a forecast for the following period. Time moves on. A new period occurred, so we append the new data-point to the current input data set, recalibrate the parameters, conduct a forecast for the following period, and repeat.

Why should I care?

This article will take you through the steps in Microsoft Excel needed to conduct backtesting We will mainly use two powerful excel built-in functions: INDEX(.) and SEQUENCE(.), and leverage Excel's "Data table" mechanism to run the different scenarios.

The backtesting generates what would-have-been forecasting errors, so you can closely examine the prediction error time series for serial correlation, distributions, outliers, and others, to better understand the model's accuracy and performance.

Let's dig in!

For this issue, we are using a synthetic stationary data set of 200 observations. The data set follows an ARMA(1,1) process, as shown next.

This figure shows the Synthetic stationary data set of 200 observations

The proposed model is ARMA(1,1)

This figure shows the Original ARMA model table

Backtesting Procedure

For every iteration, we need to do the following: (1) define the input data set (as sub-set of the original time series), (2) using the data set in (1), calibrate the parameters' values of the ARMA(1,1) model, (3) Using the model in (2) and the dataset in (1), calculate a forecast for one-period ahead.

1. Input data set

To fully describe the input data set, we require two indices: start and finish, then, using the SEQUENCE(.) function, we generate a set of indices between the start and finish. Now, we use the INDEX(.) function to return all cell-range in the original data set with row-indices in the sequence set.

Example:

The original input data set is \$A\$3:\$A\$202. To select the cells between indices 1 and 50.

=INDEX(\$A\$3:\$A\$202,SEQUENCE(50,1,1,1),1)

Note that you can define a name for your input data and reference this name in place of the input cell range.

2. Calibrate the Model

We will use the NumXL ARMA_PARAM(.) function and specify return type=2 for calibrated parameters.

This figure shows the Calibrated ARMA model

Note that ARMA_PARAM(.) returns a compact form of the model's parameters, so in the figure above, the ARMA process is: \[\begin{array}{l} {X_t} = 1.485 + 0.401{X_{t - 1}} + 0.734{a_{t - 1}} + {a_t}\\ {a_t} \sim N(0,1.14) \end{array}\]

3. Forecasting

Using the ARMA_FORE(.), data set in (1), and the model parameters calculated in 2, we can calculate the forecast values for the 1-period ahead.

Calculating the forecast values for 1-period ahead.

Note that I generated the mean forecast, forecast error, and confidence interval.

Data Table

Now, we have just completed the calculation of one step. We will use Excel's "Data table" feature to do the same math for the remaining periods up to step 200.

First, we need to prepare the output table:

Setting up a data table

Now, select the whole data table, starting with the output row and including the finish index column, as shown below:

Selecting data table cells range

Next, switch to the "Data" Toolbar and locate the "Data Table" item under "What-if Scenario."

The "Data Table" dialog pops up. Locate the "column input cell," enter a reference to the "finish" index of the data set, and click the OK button.

This figure shows the Data Table dialog box

The data table will substitute the finish index's value with the ones in our data table and store the outputs.

This figure shows the Data table output with actual errors

Note that "Std. Error" is generated by the ARMA_FORE(.) function, while the right-most column ("Error") is the error between forecast and actual realized value.

Back-Testing Analysis

First, Let's examine the backtesting forecast outputs visually to actual realized values, and then we will delve deeper into the statistical properties.

This figure shows the Data plot backtesting forecast vs actual

The shaded area in the plot corresponds to the 95% forecast confidence interval.

The plot exhibits a good model fit and, thus, forecast accuracy.

Next, let's examine the forecast error (Forecast – Actual) statistical properties using summary statistics in the NumXL toolbar.

This figure shows the Back-testing errors summary statistics

The summary statistics table indicates that forecast error is Gaussian noise with zero mean and standard deviation of 1.0

This figure shows the Back-testing KDE vs. Gaussian

In conclusion, the ARMA(1,1) is a suitable predictive model for the given data set.

What is next?

By now, you are probably wondering about the values of the model's parameters? Are they stable?

First, we construct a second "Data Table," but in the output row are the model parameters' values, and run the data table just like we did earlier.

This figure shows the Data table parameters

Now, analyze the values of every parameter as we did with the forecast error.

The values of the parameters (except theta) exhibit stability and a trend toward a constant value. The theta (MA coefficient) values are more volatile but bound between 0.6 and 0.9.

Next, we should examine the descriptive statistics and the underlying distributions of the parameters' values, but we will leave this exercise to you.

Please, refer to the attached spreadsheet for the data set and analysis.

Conclusion

This article demonstrated the steps to conduct backtesting for a predictive model with minimal or no intermediate calculation. We have used Excel's built-in functions: INDEX(.) and SEQUENCE(.) and leveraged the "Data Table" feature to run the calculation for all pre-defined indices.

Once we had the backtesting results, we turn our heads to statistical analysis and evaluate their properties and distribution to uncover any biases (e.g., serial correlation) in the output.

Attachments

Backtesting Workbook.xlsx (200 KB)