Forecast Performance Measures Survey

Mohamad

May 12, 2017 22:03

Business Forecasting is an estimate or prediction of future development in business such as demand, sales, expenditure, and profit. Forecasting has become an invaluable tool for business people to anticipate economic trends and prepare themselves either to benefit from or to counteract them. Good business forecasts can help business owners and managers adapt to a changing economy.

Forecasting is not a magical number chewed out by a black-box model for this or future months, nor it is a hit-or-miss game. Forecasting is an ongoing process worked on by several business stakeholders who improve its accuracy and its reliability over time.

During consultancy engagement with many of our customers, several key questions come up:

How do I know when my forecasts are accurate and reliable?
There are several techniques to estimate forecasts, which ones give better results?
I have tens of thousands of different SKUs, so how do we summarize the overall accuracy of my forecasting process? Etc.

First, for a process to be effectively monitored, we need to quantify the forecast accuracy (or error) over time. Second, the measure should be simple to understand, and usable for all concerned time series. Finally, we need a measure that operates on a single time series and on a group of (possibly) unrelated time series to quantify the overall forecast process capability.

With a consistent and reliable forecasting measure, we can compare the different available forecasting methods to arrive at a single forecast for each time series. Also, we can examine how the overall process looks across the scores of the time series, and its impact on the business's bottom line (e.g. overstock, lost sales, etc.).

In this paper, we will discuss the major accuracy metrics or measures often found in the field of forecasting, and their application to a single time series. In later papers, we’ll discuss methods to summarize the forecast accuracy of a group of time series.

We can categorize forecasting accuracy measures into three distinct groups:

Absolute (scale-dependent) measure: e.g. MSE, RMSE, MAE, etc.
Percentage (scale-independent) measure: e.g. MAPE, MAAPE
Relative measure: e.g. MRAE, MASE, etc.

We will present a few measures of each category, and discuss their statistical properties (e.g. robustness) and applications.

Absolute (Scale-dependent) Measures

MSE – Mean Squared Error

Calculates the mean squared error between two (2) time series: the forecast $\{f_t\}$ and the eventual outcomes $\{x_t\}$:
$$\mathrm{MSE}=\frac{\sum_{i=1}^m (x_i-f_i)^2}{m} = \frac{\sum_{i=1}^m e_i^2}{m}=\frac{SSE}{m}$$
Where:

$x_t$ is the eventual outcome at period t
$f_t$ is the forecast (using some model/method) at period t.
$e_t$ is the forecast error at period t.
$m$ is the number of observations pairs available

The MSE, as its name implies, is of a quadratic loss function form, which gives considerably more weight to large errors than smaller ones. The magnitude of values that MSE can take is dependent on the on $\{x_t\}$, and it can range from 0 to $\infty$ making a comparison across different series difficult, and raising the prospect of outliers can unduly influence the computed value

The MSE is often found useful when we are concerned about large errors whose negative consequences are proportionately much bigger than smaller ones, but MSE is not a proper measure for comparing accuracy across different time series.

Nevertheless, the MSE is very similar to the statistical measure of variance, which allows us to measure the uncertainty around our most likely forecast $F_t$. As such the MSE plays an additional, equally important role, in allowing us to know the uncertainty around the most likely prediction.

In sum, the MSE is a biased (non-robust) estimate (i.e. sensitive to outliers), and can’t be used across different time series, but can be used to estimate the uncertainty around the most recent prediction.

GMSE – Geometric Means of Square Error

The GMSE averages the product of squared errors rather than their sums as in MSE
$$\mathrm{GMSE}= \sqrt[N]{\prod_{t=1}^{N}e_t^2}=\sqrt[N]{\prod_{t=1}^{N}(x_t-f_t)^2}$$

The GMSE is influenced much less by outliers than MSE

Furthermore, mathematically speaking, the biggest advantage of using GMSE is that the mean absolute errors (MAE) can be compared by computing their geometric means (GMSE). For instance, if one GMSE is 10 and the other is 12, it can be inferred that the mean absolute error (MAE) of the second series is 20% higher than those of the first.

RMSE – Root Mean Squared Error

An alternative way to express the MSE is by computing the square root of it
$$\mathrm{RMSE}=\sqrt{\frac{\mathrm{SSE}}{N}}=\sqrt{\frac{\sum_{i=1}^N (x_i - f_i)^2}{N}}$$

The RMSE is more intuitive because it informs us about the average size of forecasting errors, regardless of their sign.

Similar to MSE, the major advantage of RMSE is that it is a measure of uncertainty in forecasting. The two major disadvantages are that it is an absolute measure which makes comparison across time series highly problematic, and it is influenced by extreme values (aka outliers).

GRMSE – Geometric Mean Root Mean Square Error

Similar to the GMSE, the GRMSE is defined as follow: $$\mathrm{GRMSE}= \sqrt[2N]{\prod_{t=1}^{N}e_t^2}=\sqrt[2N]{\prod_{t=1}^{N}(x_t-f_t)^2}$$

Outliers influence the geometric means much less than square means, but as in the case of RMSE, the GRMSE is an absolute measure and is not suitable to make comparisons across time series.

MAE –Mean Absolute Error

Calculate the mean absolute error between two (2) time series: the forecast $\{f_t\}$ and the eventual outcomes $\{x_t\}$:
$$\mathrm{MAE}=\frac{\mathrm{SAE}}{N}=\frac{\sum_{i=1}^N \left | x_i - \hat x_i \right |}{N}$$
Where:

$x_t$ is the eventual outcome at period t
$f_t$ is the forecast (using some model/method) at period t
$e_t$ is the forecast error at period t
$\mathrm{SAE}$ is the sum absolute errors.
$\mathrm{m}$ is the number of observations pairs available.

The MAE is also an absolute measure (like MSE), and this is its biggest disadvantage. Its value fluctuates from 0 and $\infty$. However, since it is not of quadratic nature, like the MSE, it is influenced less by outliers. Furthermore, because of its linear nature, its meaning is more intuitive; as it tells us about the average size of forecasting errors when the negative sign is ignored.

Percentage (Scale-independent) Measures

In this category, the measure uses a scaled-version of the forecast errors

MAPE – Mean Absolute Percentage Error

Calculates the mean absolute percentage error (MAPE) between the forecast and the eventual outcomes: $$\mathrm{MAPE}=\frac{1}{m}\times \sum_{i=1}^m \mathrm{APE}_t $$ $$\mathrm{APE}_t = \left | \frac{x_i - f_i}{x_i} \right | $$
Where:

$x_t$ is the eventual outcome at period t
$f_t$ is the forecast (using some model/method) at period t.
$\mathrm{APE}_t$ is the absolute percentage error at period t.
$m$ is the number of observations available

The MAPE expresses errors as a percentage of the actual data, which is its biggest advantage as it provides an intuitive and easy way to judge the extent of the model’s errors. Percentage errors are parts of our everyday language, making MAPE easily interpretable

Next, the MAPE is a scale-independent measure, making it suitable for comparing across different time series.

MAPE is a popular measure widely used in academic spheres and by practitioners, but it has serious disadvantages:

$\mathrm{APE}_t$ can create a problem when $x_t$ is small (close to zero) and $f_t$ is big, causing the $\mathrm{APE}_t$ to be extremely large, such that $\mathrm{MAPE}$ is meaningless
$\mathrm{MAPE}$ can be influenced a great deal by outliers

Therefore; the range of values that $\mathrm{MAPE}$ can take is from 0 to $\infty$.

SMAPE – Symmetric Mean Absolute Percentage Error

The symmetric version of MAPE tries to correct for the possible influence of outliers by dividing the forecast error by the average of both $x_t$ and $f_t$. $$\mathrm{SMAPE}=\frac{1}{m}\times \sum_{i=1}^m \mathrm{sAPE}_t $$
$$\mathrm{sAPE}_t = 2\times \left | \frac{x_i - f_i}{x_i + f_i} \right | $$

Using this definition, it does not matter whether $x_t$ is small and $f_t$ is big or $x_t$ is large and $f_t$ is small. Therefore; the range of values that can take is from zero (0) to 200%.

The symmetric MAPE is influenced by extreme values (aka outliers), to a much lesser extent than regular MAPE.

The symmetric has a couple of disadvantages: (1) its value is less intuitive than that of MAPE, and (2) Division by zero is possible when both the actual and the forecast are zero.

MdAPE – Median Absolute Percentage Error

The median absolute percentage error (regular or symmetric) is similar to MAPE, except for the use of median than arithmetic average. $$\mathrm{MdAPE}_{\mathrm{reg}} = \mathrm{median}(\mathrm{APE}_1,\mathrm{APE}_2,\cdots,\mathrm{APE}_m)$$
$$\mathrm{MdAPE}_\mathrm{sym} = \mathrm{median}(\mathrm{sAPE}_1,\mathrm{sAPE}_2,\cdots,\mathrm{sAPE}_m)$$

The biggest advantage of the MdAPE is that it is not influenced by outliers. Its biggest disadvantage is that its meaning is less intuitive. A MdAPE of 8% does not mean that average absolute error is 8%, it means that half of the absolute percentage errors in our sample are less than 8%

Note:

Using the symmetric MAPE reduces the chances of outliers and the need for MdAPE.

Furthermore, it is difficult to combine MdAPE across series or update when new data become available.

Relative Measures

In this category, we use relative forecast error to some benchmark forecast such as the latest available value (Naïve 1 forecast), or the latest available value after seasonality has to be taken into account (Naïve 2 forecast).

The relative absolute error is defined as follow: $$ \mathrm{RAE}_t = \left | \frac{x_t - f_t}{x_t - f_t^b} \right | = \left | \frac{e_t}{e_t^b} \right | $$
Where:

$x_t$ is the eventual outcome at period t
$f_t$ is the forecast (using some model/method) at period t.
$f_t^b$ is the benchmark model forecast at period t.
$e_t$ is the forecast error at period t.
$e_t^b$ is the benchmark forecast error at period t.

In the definition, a division by zero occurs when the benchmark forecast error is zero (0) (i.e. benchmark forecast value is equal to the actual value).

MRAE – Mean Relative Absolute Error

The mean relative absolute error is defined by taking the arithmetic average of the relative errors in the available periods: $$\mathrm{MRAE} = \frac{1}{m}\times \ sum_{t=1}^m \mathrm{RAE}_t $$

The MRAE is simple and intuitive to explain as an average relative error to some benchmark forecast. So, a MRAE of 90% indicates that our model on average yields a 10% smaller error than the benchmark model.

The MRAE is sensitive to low values and extreme values (outliers).

MdRAE – Median Relative Absolute Error

In this case, we use the median operator instead of the arithmetic average: $$\mathrm{MdRAE} = \mathrm{median(RAE_1,RAE_2,...,RAE_m)} $$

The main advantage of MdRAE is its resilience to extreme values (aka outliers) and low values, but it is less intuitive than MRAE.

GMRAE – Geometric Mean Relative Absolute Error

With GMRAE, we substitute the arithmetic average in MRAE with geometric mean: $$\mathrm{GMRAE} = \sqrt[m]{\prod_{t=1}^m \mathrm{RAE}_t }$$

The GMRAE is less sensitive to extreme values (aka outliers) than MRAE, but it is still simple to interpret

MASE – Mean Absolute Scaled Error

In the relative absolute error (RAE) based measure, a division by zero can occur when the benchmark model forecast is equal to the actual outcome. In demand planning, series with lots of zero values and a few intermittent spikes are common, so we can’t use any of the measures discussed so far.

A new type of measure is needed - Absolute scaled error ($q_t$ ):
$$q_t = \frac{\left | x_t - f_t \right |}{\frac{1}{n}\times\sum_{t=1}^n \left | x_t - f_f^b\right |} = \frac{\left|x_t-f_t\right |}{\mathrm{MAE}^b}=\frac{\left | e_t\right |}{\mathrm{MAE}^b}$$

In a nutshell, an absolute scaled error is the absolute forecast error divided by the mean absolute error (MAE) of the benchmark (or reference) model (e.g. naïve 1 forecast).

Mean Absolute Scaled Error is defined as follows: $$ \mathrm{MASE}=\frac{1}{m}\sum_{t=1}^m q_t = \frac{\mathrm{MAE}}{\mathrm{MAE^b}}

The MASE measures symmetric and resilient (robust) to extreme values (outliers) and small values. Furthermore, division by zero can only occur in a trivial case where all values of the input data are equal (i.e. constant function).

Furthermore, the interpretation of MASE values is simple and intuitive; a value less than one (1) implies that the forecast model has a smaller mean absolute error than those of the benchmark model (favorable), and a value greater than one indicates that forecast values perform worse than benchmark model.

Finally, MASE measure can be used for time series with lots of zero values (e.g. intermittent demands), as long as there is at least one observation with a non-zero value.

PB – Percentage Better

Another approach to handling division by zero and sensitivity to outliers is the Percentage Better (PB), which represents the percentage of cases where forecast absolute error is greater than those of the benchmark model. $$\mathrm{PB}= \frac{\sum_{t=1}^m I\{\left | e_t \right | < \left | e_t^b \right | \}}{m}$$
Where:

$I\{.\}$ is an operator that yields the values of zero (0) or one (1): $$ I(e_t)=\left\{\begin{array}{l} 1\\ 0 \end{array}\right. \begin{array}{l} \left | e_t \right | < \left | e_t^* \right | \\ \left | e_t \right | \geq \left | e_t^* \right | \end{array} $$

The PB demonstrates the percentage of occurrences where our forecast beats the benchmark, regardless of the actual magnitude of the errors. PB is easily understood and intuitive. It is resilient to extreme and small values (outliers), but it cannot be used across different time series.

Conclusion

Which forecasting measure should we use? There is no single right answer, but three. Here are our recommendations:

Absolute Measure: Use MSE and/or RMSE
The MSE and RMSE are both sensitive to extreme values (outliers), but they serve as important measures of uncertainty around the latest prediction
Percentage Measure: Use MAPE (symmetric variant)
MAPE is a widely-used measure in academic spheres and by practitioners, and its values have an intuitive interpretation. Nevertheless, it is sensitive to extreme values (outliers) and small values, so we recommend using the symmetric variant to remedy those shortcomings without losing meaning.
Relative Measure: Use MASE.
we recommend using MASE for three main reasons: (1) resilience to extreme values, (2) low to no chance of division by zero, and (3) its values have an intuitive interpretation.

References

Winkler, R.L., and Murphy, A.H., (1992) "On Seeking a Best Performance Measure or a Best Forecasting Method," International Journal of Forecasting, 8, 1, 104-107
Chatfield, C., (1988) "What is the `best' method of forecasting?" Journal of Applied Statistics, 15, 19-38.
Carbone, R. and Armstrong, J.S., (1982) "Note: Evaluating of Extrapolative Forecasting Methods: Results of a Survey of Academicians and Practitioners," Journal of Forecasting, 1, 2, 215-217.
Makridakis, S. et al., (1982) "The Accuracy of Extrapolative (Time Series Methods): Results of a Forecasting Competition," Journal of Forecasting, Vol. 1, No. 2, pp. 111-153 (lead article)
MAKRIDAKIS S. & HIBON M., (1995) "Evaluating Accuracy (or Error) Measures,"Working paper in the INSEAD, Boulevard de Const ance, Fontainebleau 77305 Cedex, France