DESMTH - (Holt's) Double Exponential Smoothing

Returns the Double (Holt) exponential smoothing out-of-sample forecast estimate.



DESMTH(X, Order, Alpha, Beta, Optimize, T, Return Type)

X is the univariate time series data (a one-dimensional array of cells (e.g. rows or columns)).

Order is the time order in the data series (i.e. the first data point's corresponding date (earliest date=1 (default), latest date=0)).

Order Description
1 ascending (the first data point corresponds to the earliest date) (default)
0 descending (the first data point corresponds to the latest date)

Alpha is the level smoothing factor (alpha should be between zero(0) and one(1) (exclusive)). If missing or omitted, 0.333 value is used..

Beta is the trend smoothing factor (beta should be between zero(0) and one(1) (exclusive)). If missing or omitted, 0.333 value is used.

Optimize is a flag (True/False) for searching and using optimal value of the smoothing factor. If missing or omitted, optimize is assumed False.

T is the forecast time/horizon beyond the end of X. If missing, a default value of 0 (Latest or end of X) is assumed.

Return Typeis a number that determines the type of return value: 0 (or missing) = Forecast, 1=Alpha, 2=Beta, 3=level component (series), 4=trend component (series), 5=one-step forecasts (series).

Return Type Description
0 or omitted Forecast value
1 Level smoothing parameter (alpha)
2 Trend smoothing parameter (beta)
3 level component (series)
4 trend component (series)
5 one-step forecasts (series)


  1. The time series is homogeneous or equally spaced.
  2. The time series may include missing values (e.g. #N/A) at either end.
  3. The double exponential smoothing is best applied to time series that exhibit prevalent additive (non-exponential) trend, but do not exhibit seasonality.
  4. The recursive form of the Holt’s double exponential smoothing equation is expressed as follows:

    $$ \begin{array}{l} \hat{F}_t(m)=S_t+m\times b_t\\ \\ S_{t\succ 1}=\alpha \times X_t | (1-\alpha)(S_{t-1} + b_{t-1})\\ b_{t\succ 1}=\beta \times (S_t - S_{t-1})+(1-\beta)b_{t-1} \end{array} $$
    • $X_t$ is the value of the time series at time t.
    • $S_{t}$ is a smoothed estimate of the value of the time series X at the end of period t.
    • $b_{t}$ is a smoothed estimate of average growth at the end of period t.
    • $\alpha$ is the level smoothing coefficient.
    • $\beta$ is the trend smoothing coefficient.
    • $\hat{F}_t(m)$ is the m-step-ahead forecast values for $X$ from time t.
  5. In DESMTH, we compute two simple, but interdependent, exponential series: level and trend. They are inter-dependent in sense that both components must be updated each period.
  6. The smoothing coefficient $\alpha$ is again used to control speed of adaptation to local level but a second smoothing constant $\beta$ is introduced to control the degree of a local trend carried through to multi-step-ahead forecast periods.
  7. For $\alpha = \beta$ , then Holt’s double exponential smoothing is equivalent to Brown’s linear exponential smoothing method.
  8. For $\beta = 0$ and the start value for trend ($b_1$ ) is also set to zero(0), the Holt’s double exponential smoothing produces the same forecasts as Brown’s simple exponential smoothing.
  9. The DESMTH calculate a point forecast. There is no probabilistic model assumed for the simple exponential smoothing, so we can’t derive a statistical confidence interval for the computed values.
  10. In practice, the Mean Squared Error (MSE) for prior out-of-sample forecast values are often used as a proxy for the uncertainty (i.e. variance) in the most recent forecast value.
  11. This method requires two starting values ($S_1,b_1$)to start the recursive updating of the equation. In NumXL, we set those values as follows:
    • $S_1$ is set to the in-sample mean, and for a very short time series, it is set as the value of the first observation.

      $$ S_{1}=\left\{\begin{array}{l} X_1\\ \frac{\sum_{t=1}^N X_t}{N} \end{array}\right. \begin{array}{r} N \leq 4\\ N \gt 4 \end{array} $$
    • $b_1$ is set to the slope of regression trend line. If not enough observations are available, then $b_1$ is set to zero(0).

      $$ b_{1}=\left\{\begin{array}{l} 0\\ \mathrm{Reg.\,Slope} \end{array}\right. \begin{array}{r} N \leq 4\\ N \gt 4 \end{array} $$
  12. Starting from NumXL version 1.63, the DESMTH has a built-in optimizer to find the best value of ($\alpha,\beta$) that minimize the SSE (loss function ($U(.)$)) for the one-step forecast calculated in-sample.

    $$ \begin{array}{l} U(\alpha,\beta)=\mathrm{SSE}=\sum_{t=1}^{N-1}(X_{t+1}-\hat{F}_t(1))^2\\ \min_{\alpha,\beta \in (0,1)} U(\alpha,\beta) \end{array} $$
  13. For initial values, the NumXL optimizer will use the input value of (alpha,beta) (if available) in the minimization problem, and the initial values for the two-smoothing series ($S_1, b_1$ ) are computed from the input data.
  14. Starting from NumXL version 1.65. the DESMTH function return the found optimal value for (alpha,beta), and the corresponding one-step smoothing series of level, trend and forecast calculated in-sample.
  15. The time series must have at least four (4) observation with non-missing values to use the built-in optimizer.
  16. NumXL implements the spectral projected gradient (SPG) method for finding the minima with a boxed boundary.
    • The SPG requires loss function value and the gradient ($\nabla$). NumXL implements the exact derivative formula (vs. numerical approximation) for performance purposes.

      $$ \begin{array}{l} \nabla U = \frac{\partial U}{\partial \alpha} \vec{e_\alpha} + \frac{\partial U}{\partial \beta} \vec{e_\beta}\\ \\ \frac{\partial U}{\partial \alpha} = -2\times\sum_{t=1}^{N-1}(X_{t+1}-\hat{F}_t(1))\times \frac{\partial \hat{F}_t}{\partial \alpha}\\ \frac{\partial U}{\partial \beta} = -2\times\sum_{t=1}^{N-1}(X_{t+1}-\hat{F}_t(1))\times \frac{\partial \hat{F}_t}{\partial \beta}\\ \\ \frac{\partial \hat{F}_t}{\partial \alpha}=\frac{\partial S_t}{\partial \alpha}+\frac{\partial b_t}{\partial \alpha}\\ \frac{\partial \hat{F}_t}{\partial \beta}=\frac{\partial S_t}{\partial \beta}+\frac{\partial b_t}{\partial \beta}\\ \\ \frac{\partial S_t}{\partial \alpha}=X_t + (1-\alpha)(\frac{\partial S_{t-1}}{\partial \alpha}+ \frac{\partial b_{t-1}}{\partial \alpha})-(S_{t-1}+b_{t-1})\\ \frac{\partial S_t}{\partial \beta}=(1-\alpha)(\frac{\partial S_{t-1}}{\partial \beta}+ \frac{\partial b_{t-1}}{\partial \beta})\\ \\ \frac{\partial b_t}{\partial \alpha}= \beta \times (\frac{\partial S_t}{\partial \alpha} -\frac{\partial S_{t-1}}{\partial \alpha}) + (1-\beta) \frac{\partial b_{t-1}}{\partial \alpha}\\ \frac{\partial b_t}{\partial \beta} = (S_t-S_{t-1})+\beta(\frac{\partial S_t}{\partial \beta} -\frac{\partial S_{t-1}}{\partial \beta})+(1-\beta)\frac{\partial b_{t-1}}{\partial \beta} -b_{t-1} \\ \end{array} $$
    • Internally, during the optimization, NumXL computes recursively both the smoothed time series, levels, trends, and the in-sample derivatives, which are used for the loss function and its derivative.
    • The SPG is an iterative (recursive) method, and it is possible that the minima can’t be found the within allowed number of iterations and/or tolerance. In this case, NumXL will not fail, instead NumXL uses the best alpha found so far.
    • The SPG has no provision to detect or avoid local minima trap. There is no guarantee of global minima.
  17. In general, the SSE function in DESMTH yields a continuous smooth convex monotone curve, that SPG minimizer almost always finds an optimal solution in a very few iterations.



Have more questions? Submit a request