# TESMTH - (Holt-Winters's) Triple Exponential Smoothing

Returns the (Holt-Winters) triple exponential smoothing out-of-sample forecast estimate.

## Syntax

TESMTH(X, Order, Alpha, Beta, Gamma, L, Optimize, T, Return Type)

X is the univariate time series data (a one-dimensional array of cells (e.g. rows or columns)).

Order is the time order in the data series (i.e. the first data point's corresponding date (earliest date=1 (default), latest date=0)).

Order Description
1 ascending (the first data point corresponds to the earliest date) (default)
0 descending (the first data point corresponds to the latest date)

Alpha is the level smoothing factor (alpha should be between zero(0) and one(1) (exclusive)). If missing or omitted, 0.333 value is used.

Beta is the trend smoothing factor (beta should be between zero(0) and one(1) (exclusive)). If missing or omitted, 0.333 value is used.

Gamma is the seasonal change smoothing factor (gamma should be between zero(0) and One(1) (exclusive)). If missing or omitted, 0.50 value is used.

L is the season Length or duration in units of steps.

Optimize is a flag (True/False) for searching and using optimal value of the smoothing factor. If missing or omitted, optimize is assumed False.

T is the forecast time/horizon beyond the end of X. If missing, a default value of 0 (Latest or end of X) is assumed.

Return Typeis a number that determines the type of return value: 0(or missing)=Forecast,1=Alpha,2=Beta,3=Gamma,4=level component(series),5=trend component(series),6=seasonal component(series),7=one-step forecasts(series).

Return Type Description
0 or omitted Forecast value
1 Level smoothing parameter (alpha)
2 Trend smoothing parameter (beta)
3 Seaonal smoothing parameter (gamma)
4 level component (series)
5 trend component (series)
6 seasonal component (series)
7 one-step (in-sample) forecasts (series)

## Remarks

1. The time series is homogeneous or equally spaced.
2. The time series may include missing values (e.g. #N/A) at either end.
3. The time series must have only positive values observation, Otherwise, TESMTH function returns #VALUE!
4. The multiplicative Holt-Winters exponential smoothing method is a robust forecasting method for seasonal time series with additive trend.
5. The multiplicative Holt-Winters seasonal model is appropriate for a time series in which the amplitude of the seasonal pattern is proportional to the average level of the series, i.e. a time series displaying multiplicative seasonality.
6. The recursive form of the Holt-Winters triple exponential smoothing equation is expressed as follows:

$$\begin{array}{l} \hat{F}_t(m)=(S_t + m\times b_t)\times C_{t-L+m}\\ \\ S_{t\succ L}=\alpha (X_t / C_{t-L})+ (1-\alpha)(S_{t-1}+b_{t-1})\\ b_{t\succ L}=\beta (S_t - S_{t-1})+(1-\beta)b_{t-1}\\ C_{t\succ L}=\gamma (X_t / S_t)+ (1-\gamma)C_{t-L} \end{array}$$
Where:
• $X_t$ is the value of the time series at time t.
• $L$ is the season Length or duration.
• $S_{t}$ is a smoothed estimate of the level component.
• $b_{t}$ is a smoothed estimate of the trend component.
• $C_{t}$ is a smoothed estimate of the seasonal indices component.
• $\alpha$ is the level smoothing coefficient.
• $\beta$ is the trend smoothing coefficient.
• $\gamma$ is the seasonal smoothing coefficient.
• $\hat{F}_t(m)$ is the m-step-ahead forecast values for $X$ from time t.
7. The seasonal factors are deﬁned so that they sum to the length of the season (L), i.e.

$$\sum_{i=1}^L C_i = L$$
8. In TESMTH, we compute three simple, but interdependent, exponential series: level, trend and seasonal indices. They are inter-dependent in sense that three components must be updated each period.
9. The smoothing coefficient $\alpha$ is again used to control speed of adaptation to local level, a second smoothing constant $\beta$ is used to control the degree of a local trend, and finally, a third smoothing constant $\gamma$ is introduced to control the degree of local seasonal Indices carried through to multi-step-ahead forecast periods.
10. For $\gamma=0$ and the start values for seasonal Indices ($C_{1,2\cdots L}$) is set to one(1), the Holt-Winters triple exponential smoothing produces the same forecasts as Holt’s double exponential smoothing (DESMTH).
11. The TESMTH calculate a point forecast. There is no probabilistic model assumed for the simple exponential smoothing, so we can’t derive a statistical confidence interval for the computed values.
12. In practice, the Mean Squared Error (MSE) for prior out-of-sample forecast values are often used as a proxy for the uncertainty (i.e. variance) in the most recent forecast value.
13. The Holt-Winters triple exponential method requires an L+2 starting values ($S_1,b_1,C_{1,2\cdots L}$) to start the recursive updating of the equations. As a rule of thumb, a minimum of two full seasons (or 2L periods) of historical data is needed to initialize a set of seasonal factors ($C_{1,2\cdots L}$)
14. In academic literature, the common method for estimating the seasonal indices as ratio of actual observation to the average seasonally adjusted values for that season. In NumXL, we adopted better procedures by Hyndman R based on decomposition, that require fewer observations, and produce better estimates.
• If the number of available observation is less than two full seasons, but more than one season.
1. Use a simple linear model with time trend and first order Fourier approximation to the multiplicative seasonality (e.g. Holt-Winters)

$$\hat{X}_t=(a+b\times t)\times (1+k \times cos(\frac{2\pi}{L}t+\phi))$$
Where:
• $(a+b\times t)$ is the linear trend.
• $a,b,k,\phi$ are parameters of unknown values.
2. Next, find the optimal values of parameters by minimizing the overall SSE for $X_t,\hat{X}_t$
3. Divide $X_t$ by the estimated linear trend ($(a+b\times t)$)
4. Calculate the seasonal Indices by averaging the detrended time series for each month in every season:

$$C_k = \frac{\sum_{i=0}^{m_k-1} C_{k+i\times L}}{m_k}$$
5. Scale the computed seasonal indices, so that their sum is equal to L.

$$C_k^{'} = \frac{C_k}{\sum_{i=1}^{L} C_i}\times L$$
6. Set $S_1 = a$, and $b_1 = b$.
• If the number of available observations is more than two full seasons, we use a procedure based on double moving average based decomposition:
1. Calculate a 2xL (centered) moving average to the first 2-3 seasons of the data. This will give us a good approximation to the (level + trend) component.
2. Detrend the data, by dividing $X_t$ by the estimated smoothing data in (i)
3. Calculate the seasonal indices by averaging the detrended time series (ii) for each month in every season available.

$$C_k = \frac{\sum_{i=0}^{m_k-1} C_{k+i\times L}}{m_k}$$
4. Scale the computed seasonal indices, so that their sum is equal to L
5. Finally, divide $X_t$ by the seasonal Indices in (iv) to get seasonal adjusted data
6. Fit a linear trend to the seasonal adjusted data to get initial estimate of $S_1$ and $b_1$.
15. Starting from NumXL version 1.63, the TESMTH has a built-in optimizer to find the best value of ($\alpha, \beta, \gamma$) that minimize the SSE (loss function ($U(.)$)) for the one-step forecast calculated in-sample.

$$\begin{array}{l} U(\alpha,\beta,\gamma)=\mathrm{SSE}=\sum_{t=1}^{N-1}(X_{t+1}-\hat{F}_t(1))^2\\ \\ \min_{\alpha,\beta,\gamma \in (0,1)} U(\alpha,\beta,\gamma) \end{array}$$
16. For initial values, the NumXL optimizer will use the input value of ($\alpha,\beta,\gamma$) (if available) in the minimization problem, and the initial values for the level, trend and seasonal indices series ($S_1,b_1, C_{1,2\cdots L}$) are are computed from the input data.
17. Starting from NumXL version 1.65. the TESMTH function return the found optimal value for ($\alpha,\beta,\gamma$), and the corresponding one-step smoothing series of level, trend, seasonal indices and forecast calculated in-sample.
18. The time series must have at least two full seasons of observations to use the built-in optimizer.
19. NumXL implements the spectral projected gradient (SPG) method for finding the minima with a boxed boundary.
• The SPG requires loss function value and the gradient ($\nabla$). NumXL implements the exact derivative formula (vs. numerical approximation) for performance purposes.

$$\begin{array}{l} \nabla U = \frac{\partial U}{\partial \alpha} \vec{e_\alpha} + \frac{\partial U}{\partial \beta} \vec{e_\beta} + \frac{\partial U}{\partial \gamma} \vec{e_\gamma}\\ \\ \frac{\partial U}{\partial \alpha} = -2\times\sum_{t=1}^{N-1}(X_{t+1}-\hat{F}_t(1))\times \frac{\partial \hat{F}_t}{\partial \alpha}\\ \frac{\partial U}{\partial \beta} = -2\times\sum_{t=1}^{N-1}(X_{t+1}-\hat{F}_t(1))\times \frac{\partial \hat{F}_t}{\partial \beta}\\ \frac{\partial U}{\partial \gamma} = -2\times\sum_{t=1}^{N-1}(X_{t+1}-\hat{F}_t(1))\times \frac{\partial \hat{F}_t}{\partial \gamma}\\ \\ \frac{\partial \hat{F}_t}{\partial \alpha}=(\frac{\partial S_t}{\partial \alpha}+\frac{\partial b_t}{\partial \alpha})C_{t-L+1}+(S_t+b_t)\frac{\partial C_{t-L+1}}{\partial \alpha}\\ \frac{\partial \hat{F}_t}{\partial \beta}=(\frac{\partial S_t}{\partial \beta}+\frac{\partial b_t}{\partial \beta})C_{t-L+1}+(S_t+b_t)\frac{\partial C_{t-L+1}}{\partial \beta}\\ \frac{\partial \hat{F}_t}{\partial \gamma}=(\frac{\partial S_t}{\partial \gamma}+\frac{\partial b_t}{\partial \gamma})C_{t-L+1}+(S_t+b_t)\frac{\partial C_{t-L+1}}{\partial \gamma}\\ \\ \frac{\partial S_t}{\partial \alpha}=\frac{X_t}{C_{t-L}}-\frac{\alpha X_t}{C_{t-L}^2}\frac{\partial C_{t-L}}{\partial \alpha}+(1-\alpha)(\frac{\partial S_{t-1}}{\partial \alpha}+\frac{\partial b_{t-1}}{\partial \alpha})-(S_{t-1}+b_{t-1})\\ \frac{\partial S_t}{\partial \beta}=-\frac{\alpha X_t}{C_{t-L}^2}\frac{\partial C_{t-L}}{\partial \beta}+(1-\alpha)(\frac{\partial S_{t-1}}{\partial \beta}+\frac{\partial b_{t-1}}{\partial \beta})\\ \frac{\partial S_t}{\partial \gamma}=-\frac{\alpha X_t}{C_{t-L}^2}\frac{\partial C_{t-L}}{\partial \gamma}+(1-\alpha)(\frac{\partial S_{t-1}}{\partial \gamma}+\frac{\partial b_{t-1}}{\partial \gamma})\\ \\ \frac{\partial b_t}{\partial \alpha}=\beta (\frac{\partial S_t}{\partial \alpha} - \frac{\partial S_{t-1}}{\partial \alpha})+(1-\beta)\frac{\partial b_{t-1}}{\partial \alpha}\\ \frac{\partial b_t}{\partial \beta}=(S_t-S_{t-1})+\beta (\frac{\partial S_t}{\partial \beta} - \frac{\partial S_{t-1}}{\partial \beta})+(1-\beta)\frac{\partial b_{t-1}}{\partial \beta}-b_{t-1}\\ \frac{\partial b_t}{\partial \gamma}=\beta (\frac{\partial S_t}{\partial \gamma} - \frac{\partial S_{t-1}}{\partial \gamma})+(1-\beta)\frac{\partial b_{t-1}}{\partial \gamma}\\ \\ \frac{\partial C_t}{\partial \alpha}=-\frac{\gamma X_t}{S_t^2}\frac{\partial S_t}{\partial \alpha}+(1-\gamma)\frac{\partial C_{t-L}}{\partial \alpha}\\ \frac{\partial C_t}{\partial \beta}=-\frac{\gamma X_t}{S_t^2}\frac{\partial S_t}{\partial \beta}+(1-\gamma)\frac{\partial C_{t-L}}{\partial \beta}\\ \frac{\partial C_t}{\partial \gamma}=\frac{X_t}{S_t}-\frac{\gamma X_t}{S_t^2}\frac{\partial S_t}{\partial \gamma}+(1-\gamma)\frac{\partial C_{t-L}}{\partial \gamma} - C_{t-L}\\ \\ \end{array}$$
• Internally, during the optimization, NumXL computes recursively both the smoothed time series, levels, trends, seasonal indices and all the in-sample derivatives, which are used for the loss function and its derivative.
• The SPG is an iterative (recursive) method, and it is possible that the minima can’t be found the within allowed number of iterations and/or tolerance. In this case, NumXL will not fail, instead NumXL uses the best alpha found so far.
• The SPG has no provision to detect or avoid local minima trap. There is no guarantee of global minima.
20. In general, the SSE function in TESMTH yields a continuous smooth non-monotone curve, that SPG minimizer almost always finds an optimal solution in a very few iterations.