Finding Optimal Initial Values for Smoothing

In practice, the Brown’s simple exponential smoothing function is best suited for a time series that does not exhibit upward or downward trend or seasonality. The input time series is assumed to have a changing mean that is not fixed over time. The recursive formula for the Simple Exponential Smoothing function is expressed as: 

$$S_{t>1} = \alpha\times X_t +(1-\alpha)\times S_{t-1}$$ $$\textrm{OR}$$ $$S_{t>1}=\alpha\times X_t + \alpha(1-\alpha)X_{t-1}+\cdots+\alpha(1-\alpha)^{t-1}X_1+(1-\alpha)^t S_1$$


  • Xt is the value of the time series at time t.
  • St is the smoothed level.
  • α is the smoothing factor/coefficient for level (0 ≤ α ≤1).


To illustrate how this formula works, let’s iterate a few steps:


$$S_{1} = ?$$ $$S_{2} = \alpha\times X_{2} + (1-\alpha) \times S_{1}$$ $$S_{3} = \alpha\times X_{3} + (1-\alpha) \times S_{2}$$ $$S_{4} = \alpha\times X_{4} + (1-\alpha) \times S_{3}$$ $$\cdots$$ $$S_{n} = \alpha\times X_{n} + (1-\alpha) \times S_{n-1}$$

The smoothing coefficient (α), controls the smoothed level’s adaptation speed, but we much choose the initial value of S1? carefully, because a poor choice will require more time before the recursive smoothing formula can adapt and dissipate its effect. The impact is very tangible among small time series and those with a small value.

Why should I care?
For small and medium-sized time series, the initial level value has an incredibly large effect on early forecast values, leading to poor forecast value(s). As a best practice, all NumXL users should find the smoothing coefficient’s optimal values and set an optimal initial value to expedite optimization convergence.

Additionally, for those who wish to track and verify the intermediate calculations, knowing the initial value selection criteria is vital for achieving identical values. There are four different methods to estimate the best initial value for exponential smoothing.


Method 1: First Observation Value

Simply set the S1 value equal to X1. This method is found in almost every textbook, so why shouldn’t it be used? By letting the smoothed level series start at the same value of the input time series, the value of the first observation will have an incredible impact on the forecast value. The impact is particularly found on small smoothing coefficient (α) values, which could lead to a poor fit and forecast values.


Method 2: Average value of the 1st few observations

If your time series has 5 or more observations, set S1 to the average of the first 4 observations.

$$S_1=\left\{\begin{matrix} X_1 & N\leqslant 4 \\ \frac{\sum_{i=1}^4 X_i}{4} & N > 4 \end{matrix}\right.$$

In this method, the impact of one observation value on the forecast value is mitigated. However, we are not sure if the initial value of the smoothed level is optimal.

Why does this method take only the first 4 observations and not the average of the whole data sample? There is no hard rule of how many observations to include, but in practice, it is best to use less than ten (10) observations to estimate initial value (S1).


Method 3: Backcasting

Backcasting is simply reversing the time series so that we forecast into the past instead of the future. In this method, we employ the smoothing algorithm to estimate the initial value by going backward in the series. Many practitioners and academics strongly advocate using the backcasting method.
The backcasting method calculates the initial value by:

  • Reversing the chronicle order of the time series.
  • Estimating the initial value of the reversed time series using method 2 (see above).
  • Applying the exponential smoothing function on the reversed time series and calculating the smoothed level for all observations.
  • Find an optimal value for the smoothing coefficient.
  • Using the optimal smoothing coefficient value, calculate the level of the last observation. This will serve as the initial value estimate for the original time series.


Method 4: Optimization

This method searches for the optimal values of the starting level (S1) and the smoothing coefficient (α) value that minimizes the sum of squared errors between smoothed level and original values of the time series.

Unlike the smoothing coefficient (α), the start level (S1) values are unbounded which can create a challenge for many non-linear optimizers. However, this issue can be easily addressed by scaling down the value.

Method 4 should yield the best results, but it takes longer to converge than optimizing for only the smoothing parameter. However, having good initial values will speed up the optimization process significantly.


Does the anticipated accuracy improvement warrant the additional effort?

Let’s get a couple of things out the way. First, the added complexity of estimating a better initial value is handled primarily by computer programs, so it will at least enhance the smoothing algorithm with no extra effort to you. Second, you will still need to tap into your intuition to make sure that your data set is best fitted with the simple exponential smoothing model.

Not all smoothing functions are created or implemented equally, and the process of estimating the initial value makes a direct impact on your final forecast accuracy.

In conclusion, we employ the backcasting method to find initial level value (S1), then run the optimizer to find the optimal value for the level smoothing parameter (i.e., alpha).

We’ll continue discussing how to estimate initial values for all exponential smoothing functions, so stay tuned.

Have more questions? Submit a request