Moving Average (MA) Model

Occasionally, we receive requests for a technical issue about ARMA modeling beyond our regular NumXL support, which delves more into the mathematical formulation of ARMA. We are always happy to help our users with any question they may have, so we decided to share our internal technical notes with you.

These notes were originally composed when we sat in on a time series analysis class. Over the years, we’ve maintained these notes with new insights, empirical observations and intuitions acquired. We often go back to these notes for resolving development issues and/or to properly address a product support matter.

In this paper, we’ll go over a simple, yet fundamental, econometric model: moving average. This model serves as a cornerstone for all serious discussion on ARMA/ARIMA models.


A moving average model of order q (i.e. MA(q)) is defined as follows: $$x_t=\mu + a_t + \theta_1a_{t-1}+\theta_2a_{t-2}+\cdots+\theta_qa_{t-q}$$ $$a_t=\epsilon_t\times\sigma$$ $$\epsilon_t\sim\textrm{i.i.d}\sim N(0,1)$$


  • $a_t$ is the innovations or shocks for our process
  • $\sigma$ is the conditional standard deviation (aka volatility)

The output value ($x_t$) is solely determined by a long run average ($\mu$ ) and a weighted sum of past shocks or innovations ($\{a_t\}$).


By definition, the MA process is stable and has a finite long-run mean ($\mu$) and variance:

  1. The unconditional (i.e. long-run) mean is simply $\mu$ $$E[x_t]=E[\mu + a_t + \theta_1a_{t-1}+\theta_2a_{t-2}+\cdots+\theta_qa_{t-q}]=\mu$$
  2. The unconditional (i.e. long run) variance is defined as follows: $$Var[x_t]=E[(x_t-\mu)(x_t-\mu)]=E[(a_t + \theta_1a_{t-1}+\theta_2a_{t-2}+\cdots+\theta_qa_{t-q})^2]$$ $$Var[x_t]=(1+\theta_1^2+\theta_2^2+\cdots+\theta_1^2)\sigma^2$$
    1. For a finite order q, the process is guaranteed to be stable (i.e. does not converge to infinity).
    2. For an infinite order (i.e. $\textrm{MA}(\infty)$), the process is stable only if the long-run variance is finite: $$Var[x_t]=(1+\theta_1^2+\theta_2^2+\cdots)\sigma^2=(1+\sum_{i=1}^\infty \theta_i^2)\sigma^2$$

      In other words, the sum of the squared values of the MA coefficients is finite.

      $$\sum_{i=1}^\infty \theta_i^2 < \infty $$


Given an input sample data $\{x_1,x_2,\cdots,x_T\}$, we can calculate values of the moving average process for future (i.e. out-of-sample) values as follows:

$$x_T=\mu + a_t + \theta_1a_{T-1}+\theta_2a_{T-2}+\cdots+\theta_qa_{T-q}$$ $$E[x_{T+1}|a_1,a_2,\cdots,a_T]=\mu +\theta_1a_{T}+\theta_2a_{T-1}+\cdots+\theta_qa_{T+1-q}$$ $$E[x_{T+2}|a_1,a_2,\cdots,a_T]=\mu +\theta_2a_{T}+\theta_3a_{T-1}+\cdots+\theta_qa_{T+2-q}$$ $$\cdots$$ $$E[x_{T+q}|a_1,a_2,\cdots,a_T]=\mu+\theta_qa_{T}$$ $$E[x_{T+q+k}|a_1,a_2,\cdots,a_T]=\mu$$

The variance (squared standard error) of the out-of-sample values is expressed as follows:

$$Var[x_{T+1}|a_1,a_2,\cdots,a_T]=Var[a_{T+1}+\theta_1a_{T}+\theta_2a_{T-1}+\cdots+\theta_qa_{T+1-q}]=\sigma^2$$ $$Var[x_{T+2}|a_1,a_2,\cdots,a_T]=Var[a_{T+2}+\theta_1a_{T+1}+\theta_2a_{T}+\theta_3a_{T-1}+\cdots+\theta_qa_{T+2-q}]=(1+\theta_1^2)\sigma^2$$ $$Var[x_{T+3}|a_1,a_2,\cdots,a_T]=Var[a_{T+3}+\theta_1a_{T+2}+\theta_2a_{T+1}+\theta_3a_{T}+\cdots+\theta_qa_{T+3-q}]=(1+\theta_1^2+\theta_2^2)\sigma^2$$ $$\cdots$$ $$Var[x_{T+q}|a_1,a_2,\cdots,a_T]=(1+\theta_1^2+\theta_2^2+\cdots+\theta_{q-1}^2a_{T+1}+\theta_qa_{T})=(1+\theta_1^2+\theta_2^2+\cdots+\theta_{q-1}^2)\sigma^2 $$ $$Var[x_{T+q+k}|a_1,a_2,\cdots,a_T]=(1+\theta_1^2+\theta_2^2+\cdots+\theta_{q}^2)\sigma^2$$

Note: The conditional variance grows cumulatively over q-steps to reach its long-run (unconditional) variance.


By definition, a weak-sense stationary (weak stationary) process requires the first moment (i.e. mean) and covariance do not vary with respect to time.

For the first moment, the conditional mean is obviously time-invariant:


For the second moment (i.e. variance and auto-covariance), let’s examine this assumption. By definition, the auto-covariance for lag order j is expressed as follows:


The auto-covariance of lag order zero ($\gamma_o$) is the same as the unconditional variance:

$$\gamma_o=(1+\theta_1^2+\theta_2^2+\cdots+\theta_q^2)\sigma^2$$ $$\gamma_1=(\theta_1+\theta_1\theta_2+\theta_2\theta_3+\theta_3\theta_4+\cdots+\theta_{q-1}\theta_q)\sigma^2$$ $$\gamma_2=(\theta_2+\theta_1\theta_3+\theta_2\theta_4+\theta_3\theta_5+\cdots+\theta_{q-2}\theta_q)\sigma^2$$ $$\gamma_3=(\theta_3+\theta_1\theta_4+\theta_2\theta_5+\theta_3\theta_6+\cdots+\theta_{q-3}\theta_q)\sigma^2$$ $$\cdots$$ $$\gamma_q=\theta_q\sigma^2$$ $$\gamma_{k>q}=0$$

To examine for variability with respect to time, it is sufficient to examine the following:

$$\gamma_j=E[(x_t-\mu)(x_{t-j}-\mu)]=\gamma_{-j}=E[(x_t-\mu)(x_{t+j}-\mu)]$$ $$\gamma_j =\gamma_{-j}=F(j)$$

Using either definition, we can easily show that the auto-covariance function does not vary with respect to time (t), but is rather determined solely by the lag order j. In sum, the moving average process is a weak-sense stationary process.


What do the moving average correlogram plots look like? Can we identify a moving average process (and its order) from correlogram plots (i.e. ACF and PACF)?

The auto-correlation function (ACF) is defined as the ratio of auto-covariance and unconditional variance:


By definition;$\rho_o=1$ and the next q ACF:

$$\rho_1=\frac{\theta_1+\theta_1\theta_2+\theta_2\theta_3+\theta_3\theta_4+\cdots+\theta_{q-1}\theta_q}{1+\theta_1^2+\theta_2^2+\cdots+\theta_q^2}$$ $$\rho_2=\frac{\theta_2+\theta_1\theta_3+\theta_2\theta_4+\theta_3\theta_5+\cdots+\theta_{q-2}\theta_q}{1+\theta_1^2+\theta_2^2+\cdots+\theta_q^2}$$ $$\rho_3=\frac{\theta_3+\theta_1\theta_4+\theta_2\theta_5+\theta_3\theta_6+\cdots+\theta_{q-3}\theta_q}{1+\theta_1^2+\theta_2^2+\cdots+\theta_q^2}$$ $$\cdots$$ $$\rho_q=\frac{\theta_q}{1+\theta_1^2+\theta_2^2+\cdots+\theta_q^2}$$ $$\rho_{k>q}=0$$

What does this mean? If the ACF plot exhibits significant values for the first j-lags and then drops to zero, we can probably model the process with a moving average model of order j.

Using only the ACF plot, I should be able to construct an MA model for any process, right? Yes.

Can two or more different MA processes have the same ACF function values? Yes.

Is there a special case (restricted) of MA processes that have a unique ACF function? Yes, they are called invertible MA processes.

In theory, no two invertible MA processes have the same ACF function.

MA Invertability

An invertible MA model $\textrm{MA}(q)$ is one that can be represented as a converging infinite order auto-regressive $\textrm{AR}(\infty)$ model. By converging, we mean that the AR coefficient decreases to zero as we go back in time.

Using the lag-operator ($L$) notation, the moving average process can be represented as follows:

$$x_t-\mu=y_t=(1+\theta_1L+\theta_2L^2+\cdots+\theta_qL^q)a_t$$ $$\frac{y_t}{1+\theta_1L+\theta_2L^2+\cdots+\theta_qL^q}=a_t$$

Using partial-fraction decomposition:

$$\frac{y_t}{(1+\lambda_1L)(1+\lambda_qL)\cdots(1+\lambda_qL)}=a_t$$ $$[\frac{c_1}{(1+\lambda_1L)}+\frac{c_2}{(1+\lambda_2L}+\cdots+\frac{c_q}{(1+\lambda_qL}]y_t=a_t$$

Where $\{\lambda_i\}$ is the set of characteristic roots for the MA process. $\lambda_i$ can take real or complex values.

Now, assuming that $\lambda_i$ falls inside the unit circle (i.e. $\|\lambda_i\|<1$ ) for $\forall i\leq q$, each fraction can be expressed as a converging geometric series:


Applying this transformation to all partial-fractions, you can express the MA process as an infinite order AR process follows:

$$[(c_1+c_2+\cdots+c_q)-(c_1\lambda_1+c_2\lambda_2+\cdots+c_1\lambda_q)L-(c-1\lambda_1^2+c_2\lambda_2^2+\cdots+c_q\lambda_q^2)L^2+\cdots]y_t=a_t$$ $$[1-\phi_1L+\phi_2L^2-\cdots+(-1)^k\phi_kL^k+\cdots]y_t=a+t$$


  • $\phi_k=c_1\lambda_1^k+c_2\lambda_2^k+\cdots+c_q\lambda_q^k$
  • $\lim_{k\rightarrow \infty}\phi_k=0$

To check for invertibility, it is sufficient to find the characteristics roots $\lambda_i$ and verify that their values fall inside the unit circle ($\|\lambda_i\|<1$). Going forward, we will only consider the invertible MA process.

Example: MA(1)

$$x_t-\mu=(1+\theta L)a_t$$

The characteristic root for the MA(1) process is $\theta$. Assuming $\|\theta\|<1$ , the algebraic AR representation of the MA(1) is expressed as follows:

$$\frac{x_t-\mu}{1+\theta L}=(1-\theta L+\theta^2L^2-\theta^3L^3+\cdots)(x_t-\mu)=a_t$$

The AR representation is converging as $\lim_{k\rightarrow \infty}\theta^k=0$ for $\|\theta \|<1$

We are not quite done yet. There are a couple of pivotal mathematical tricks that we still need to cover: (1) Impulse response function (IRF) and (2) Integrated moving average process.

1. Impulse Response Function (IRF)

The impulse response function describes the model output triggered by a single shock at time T.

$$a_t=\left\{\begin{matrix} 1 & {t=1}\\ 0 & {t\neq 1} \end{matrix}\right.$$

Applying it to a moving average model of order q, the process value is as follows:

$$x_1=1$$ $$x_2=\theta_1$$ $$x_3=\theta_2$$ $$x_4=\theta_3$$ $$\cdots$$ $$x_q=\theta_{q-1}$$ $$x_{q+1}=\theta_q$$

Note: the IRF for an MA process is finite (q+1) and its values are equal to the coefficient.

If we have an IRF of some unknown process, can we model it as an MA process? You bet!

Finding the values of the MA coefficients algebraically can be tedious and complex, but using the IRF approach can vastly simplify the task.

Example: ARMA(p,q)

Consider the general ARMA(p,q) process:

$$x_t=\phi_1 x_{t-1}+\phi_2x_{t-2}+\cdots+\phi_px_{t-p}+a_t+\theta_1 a_{t-1}+ \theta_2 a_{t-2}+\cdots+\theta_q x_{t-q}$$ $$\textrm{OR}$$ $$(1-\phi_1L-\phi_2L^2-\phi_3L^3-\cdots-\phi_pL^p)x_t=(1+\theta_1L+\theta_2L^2+\theta_3L^3+\cdots+\theta_qL^q)a_t$$

We wish to derive the MA representation of the process. We can divide the two components (i.e. AR and MA) into polynomials:


Or we can use the IRF of the ARMA process to derive the values of the MA coefficients:

$$x_t=a_t=1$$ $$x_{t+1}=\phi_1x_t+\theta_1a_t=\phi_1+\theta_1$$ $$x_{t+2}=\phi_1x_{t+1}+\phi_2x_{t}+\theta_2a_t$$ $$x_{t+3}=\phi_1x_{t+2}+\phi_2x_{t+1}+\phi_3x_{t}+\theta_3a_t$$ $$\cdots$$

Deriving the MA coefficient values is an iterative and straightforward process that will save us from carrying out complex polynomial division.

By now, you may be wondering why we would wish to convert a finite-order ARMA process to an infinite-order MA representation. For starters, forecasting (mean and error) using an MA representation is much easier than using the original higher order ARMA representation.

2. Integration

Integration (i.e. unit root) often arises in time series (e.g. random walk, ARIMA, etc.). In these situations, we model the differenced time series with an ARMA class model:

$$(1-L^s)^D(1-L)^d x_t=y_t$$ $$(1-\phi_1L-\phi_2L^2-\phi_3L^3-\cdots-\phi_pL^p)y_t=(1+\theta_1L+\theta_2L^2+\theta_3L^3+\cdots+\theta_qL^q)a_t$$

But how do we take the ARMA outputs back to the un-differenced scale?

Example 1: Consider a first order integration of the MA(q) process:


To calculate the out-of-sample (i.e. forecast) values:

$$x_{T+1}-x_T=(1+\theta_1L+\theta_2L^2+\cdots+\theta_qL^q)a_{T+1}$$ $$x_{T+1}=x_T+a_{T+1}+\theta_1a_T+\theta_2a_{T-1}+\cdots+\theta_qa_{T+1-q}$$ $$x_{T+2}=x_{T+1}+a_{T+2}+\theta_1a_{T+1}+\theta_2a_T+\cdots+\theta_qa_{T+2-q}$$ $$x_{T+2}=x_T+a_{T+2}+(1+\theta_1)a_{T+1}+(\theta_1+\theta_2)a_T+(\theta_2+\theta_3)a_{T-1}\cdots+(\theta_{q-1}-\theta_q)a_{T+2-q}+\theta_q a_{T+1-q}$$ $$x_{T+3}=x_{T+2}+a_{T+3}+\theta_1 a_{T+2}+\theta_2 a_{T+1}+\cdots +\theta_q a_{T+3-q}$$ $$x_{T+3}=x_T+a_{T+3}+(1+\theta_1)a_{T+2}+(1+\theta_1+\theta_2)a_{T+1}+(\theta_1+\theta_2+\theta_3)a_T+\cdots+(\theta_{q-2}+\theta_{q-1}+\theta_q)a_{T+3-q}+(\theta_{q-1}+\theta_q)a_{T+2-q}+\theta_qa_{T+1-q}$$ $$\cdots$$



$$x_{T+k}=(1+(1+\theta_1)L+(1+\theta_1+\theta_2)L^2+\cdots+(1+\theta_1+\theta_2+\cdots+\theta_q)L^q)a_{T+k}+M$$ $$M=x_T+(\theta_1+\theta_2+\cdots+\theta_q)a_T+(\theta_2+\cdots+\theta_q)a_{T-1}+(\theta_3+\cdots+\theta_q)a_{T-2}+\cdots+\theta_qa_{T+1-q}$$

Furthermore, the variance of the forecast value is constant after q steps:


In sum, the integrated zero-mean MA process yielded another MA process, but with a mean.

What if the differenced time series has a non-zero mean?


In this case, we have a time trend of rate equal to $\mu$

$$x_{T+k}=k\times\mu +(1+(1+\theta_1)L+(1+\theta_1+\theta_2)L^2+\cdots+(1+\theta_1+\theta_2+\cdots+\theta_q)L^q)a_{T+k}+M$$ $$M=x_T+(\theta_1+\theta_2+\cdots+\theta_q)a_T+(\theta_2+\cdots+\theta_q)a_{T-1}+(\theta_3+\cdots+\theta_q)a_{T-2}+\cdots+\theta_qa_{T+1-q}$$

For higher order integration d ($d > 1$), we repeat the procedure above d times, but we would need more realized observations than the last one. For example:

$$(1-L)^2x_t=(1-L)y_t=\mu+(1+\theta_1L+\theta_2L^2+\cdots+\theta_qL^q)a_{T+k}$$ $$(1-L)x_t=y_t$$

To solve for $y_t$, we’d need $y_T$, which is calculated by $x_T-x_{T-1}$. Next, to solve for $x_t$ using the process, we’d need $X_T$. In sum, $(x_T,x_{T-1})$ are required as initial conditions for the integration.


The moving average process, despite its simplicity, is a rather useful model to work with, especially when it comes to forecasting.

Armed with a couple of mathematical tricks (IRF and Integration), we can tackle many more complex processes by representing them first by an MA.

In future technical notes, we will discuss advanced models, but often refer to the MA process and material presented here.

Have more questions? Submit a request