Autoregressive (AR) Model

We originally composed these technical notes after sitting in on a time series analysis class. Over the years, we’ve maintained these notes and added new insights, empirical observations, and intuitions acquired. We often go back to these notes for resolving development issues and/or to properly address a product support matter.

In this paper, we’ll go over another simple, yet fundamental, econometric model: the auto-regressive model. Make sure you have looked over our prior paper on the moving average model, as we build on many of the concepts presented in that paper.

This model serves as a cornerstone for any serious application of ARMA/ARIMA models.


The auto-regressive model of order p (i.e. AR(p)) is defined as follows:

$$x_r=\phi_o+\phi_1 x_{t-1}+\phi_2 x_{t-2}+ \cdots + \phi_p x_{t-p} + a_t$$ $$a_t=\epsilon_t \times \sigma $$ $$\epsilon_t \sim \textrm{i.i.d} \sim N(0,1)$$


  • $a_t$ is the innovations or shocks for our process
  • $\sigma$ is the conditional standard deviation (aka volatility)

Essentially, the AR(p) is merely a multiple linear regression model where the independent (explanatory) variables are the lagged editions of the output (i.e. $x_{t-1},x_{t-2},\cdots,x_{t-p}$). Keep in mind that $x_{t-1},x_{t-2},\cdots,x_{t-p}$ may be highly correlated with each other.

Why do we need another model?

First, we can think of an AR model as a special (i.e. restricted) representation of a $MA(\infty)$ process. Let’s consider the following stationary AR (1) process:

$$x_t=\phi_o+\phi_1 x_{t-1} + a_t$$ $$x_t - \mu = \phi_o - \mu + \phi_1 x_{t-1} - \phi_1 \mu + \phi_1 \mu + a_t $$ $$(1-\phi_1 L)(x_t - \mu)= \phi_o -\mu +\phi_1 \mu + a_t$$

Now, by subtracting the long-run mean from the response variable ($x_t$), the process now has zero long-run (unconditional/marginal) mean.

$$\phi_o -\mu +\phi_1 \mu =0 $$ $$\Rightarrow \mu = \frac{\phi_o}{1-\phi_1}$$ $$\Rightarrow \phi_1 \neq 1$$

Next, the process can be further simplified as follows:

$$(1-\phi_1 L)(x_t - \mu)= (1-\phi_1 L) z_t = a_t$$ $$z_t = \frac{a_t}{1-\phi_1 L}$$

For a stationary process, the $\left \| \phi_1 \right \| <1$

$$z_t=\frac{a_t}{1-\phi_1 L} = (1+\phi_1 L + \phi_1^2 L^2 + \cdots + \phi_1^K L^K + \cdots ) a_t$$

In sum, using the AR(1) model, we are able to represent this $MA(\infty)$ model using a smaller storage requirement.

We can generalize the procedure for a stationary AR(p) model, and assuming an $MA(\infty)$ representation exists, the MA coefficients’ values are solely determined by the AR coefficient values:

$$x_t = \phi_o + \phi_1 x_{t-1} + \phi_2 x_{t-2} + \cdots + \phi_p x_{t-p} + a_t$$ $$x_t - \mu = \phi_o -\mu + \phi_1 x_{t-1} -\phi_1 \mu + \phi_1 \mu + \phi_2 x_{t-2} + -\phi_2 \mu + \phi_2 \mu \cdots + \phi_p x_{t-p} -\phi_p \mu + \phi_p \mu + a_t$$ $$(1-\phi_1 L - \phi_2 L^2 -\cdots - \phi_p L^p)(x_t-\mu)=\phi_o -\mu + \mu (\phi_1 + \phi_2 +\cdots+\phi_p) + a_t$$

Once again, by design, the long-run mean of the revised model is zero.

$$\phi_o -\mu +\mu (\phi_1+\phi_2+\cdots+\phi_p) = 0$$ $$\Rightarrow \mu = \frac{\phi_o}{1-\sum_{i=1}^p \phi_i}$$ $$\Rightarrow \sum_{i=1}^p \phi_i \neq 1$$

Hence, the process can be represented as follows:

$$(1-\phi_1 L - \phi_2 L^2 -\cdots - \phi_p L^p)z_t = a_t $$ $$(x_t - \mu) = z_t = \frac{a_t}{1-\phi_1L-\phi_2 L^2 - \cdots - \phi_p L^p}=\frac{a_t}{(1-\lambda_1 L)(1-\lambda_2 L)\cdots (1-\lambda_p L)}$$

By having $\left \| \lambda_i \right \| <1, \forall i\in \{1,2,\cdots,p\} $, we can use the partial-fraction decomposition and the geometric series representation; we then construct the algebraic equivalent of the $MA(\infty)$ representation.

Hint: By now, this formulation looks enough like what we have done earlier in the MA technical note since we inverted a finite order MA process into an equivalent representation of $AR(\infty)$.

The key point is being able to convert a stationary, finite-order AR process into an algebraically equivalent $MA(\infty)$ representation. This property is referred to as causality.


Definition: A linear process $\{X_t\}$ is causal (strictly, a causal function of $\{a_t\}$) if there is an equivalent $MA(\infty)$ representation.

$$x_t=\Psi(L)a_t=\sum_{i=0}^\infty \psi_i L^i a_t$$


$$\sum_{i=1}^\infty \left \| \psi_i \right \| < \infty $$

Causality is a property of both $\{X_t\}$ and $\{a_t\}$.

In plain words, the value of $\{X_t\}$ is solely dependent on the past values of $\{a_t\}$.


An AR(p) process is causal (with respect to $\{a_r\}$) if and only if the characteristics roots (i.e. $ 1/\lambda_i$) fall outside the unit circle (i.e. $1/\|\lambda_i\| > 1 \Rightarrow \|\lambda_i\| < 1 $ ).

Let’s consider the following example:

$$(1-\phi L)(x_t - \mu) = (1-\phi L)z_t = a_t$$ $$\|\phi\| > 1 $$ $$z_t = \phi z_{t-1} + a_t $$

Now, let’s re-organize the terms in this model:

$$ z_{t-1}=\frac{1}{\phi}(z_t - a_t)$$ $$z_{t-1} = \psi z_t + {a_t}'$$ $$\|\psi\| < 1 $$

Convert the new AR process into an MA

$$z_t = \psi z_{t+1} + {a_{t+1}}' $$ $$z_t = \psi(\psi z_{t+2} + {a_{t+2}}') + {a_{t+1}}'$$ $$z_t = \psi ( \psi (\psi z_{t+3} + {a_{t+3}}') + {a_{t+2}}') + {a_{t+1}}'$$ $$\cdots $$ $$z_t = \psi^K z_{t+K+1} + {a_{t+1}}' + \psi {a_{t+2}}' + \cdots + \psi^{K-1} {a_{t+K-1}}' $$

The process above is non-causal, as its values depend on future values of $\{{a_t}'\}$observations. However, it is also stationary.

Going forward, for an AR (and ARMA) process, stationarity is not sufficient by itself; the process must be causal as well. For all our future discussions and application, we shall only consider stationary causal processes.


Similar to what we did in the moving average model paper, we will now examine the long-run marginal (unconditional) mean and variance.

(1) Let’s assume the long-run mean ($\mu$) exists, and:

$$E[x_t]=E[x_{t-1}]=\cdots = E[x_{t-p}]=\mu < \infty $$

Now, subtract the long-run mean from all output variables:

$$x_t-\mu + \mu = \phi_o +\phi_1 x_{t-1} - \phi_1 \mu + \phi_1 \mu +\phi_2 x_{t-2} - \phi_2 \mu + \phi_2 \mu + \cdots + \phi_p x_{t-p} - \phi_p \mu + \phi_p \mu + a_t $$ $$(x_t-\mu) = \phi_o + \phi_1 (x_{t-1}-\mu) + \phi_2 (x_{t-2}-\mu) + \cdots + \phi_p (x_{t-p}-\mu) + a_ t + \mu(\phi_1+\phi_2+\cdots + \phi_p -1)$$

Take the expectation from both sides:

$$ E[(x_t-\mu)] = \phi_o + E[\phi_1(x_{t-1}-\mu)]+E[\phi_2 (x_{t-2}-\mu)]+\cdots+ E[\phi_p (x_{t-p}-\mu)] + E[a_t] + \mu(\phi_1+\phi_2+\cdots + \phi_p -1)$$ $$ 0 = \phi_o + \mu (\phi_1 + \phi_2 + \cdots + \phi_p -1) $$ $$ \mu = \frac{\phi_o}{1-\phi_1-\phi_2-\cdots-\phi_p}$$ $$\Rightarrow \sum_{i=1}^p \phi_i \neq 1$$

In sum, for the long-run mean to exist, the sum of values of the AR coefficients can’t be equal to one.

(2) To examine the long-run variance of an AR process, we’ll use the equivalent $MA(\infty)$ representation and examine its long-run variance. $$(x_t-\mu) = y_t = \phi_1 y_{t-1} + \phi_2 y_{t-2} + \cdots + \phi_p y_{t-p} + a_t$$ $$(1-\phi_1 L - \phi_2 L^2 - \cdots - \phi_p L^p)y_t = a_t$$ $$y_t= \frac {a_t}{1 - \phi_1 L - \phi_2 L^2 - \cdots - \phi_p L^p}$$

Using partial-fraction decomposition:

$$y_t = \left [ \frac{c_1}{1-\lambda_1 L}+\frac{c_2}{1-\lambda_2 L}+\cdots+\frac{c_p}{1-\lambda_p L} \right ]a_t$$

For a stable MA process, all characteristics roots (i.e. $1/\lambda_i$) must fall outside the unit circle (i.e. $\|\lambda_i\| < 1 $): $$y_t=[(c_1+c_2+\cdots+c_p) + (c_1\lambda_1+c_2\lambda_2+\cdots+c_p\lambda_p)L+ (c_1\lambda_1^2+c_2\lambda_2^2+\cdots+c_p\lambda_p^2)L^2+\cdots)a_t$$


It can be easily shown that:

$$c_1 + c_2+ \cdots + c_p = 1$$

Next, let’s examine the convergence property of the MA representation:

$$\lim_{k\rightarrow \infty}(c_1\lambda_1^k+c_2\lambda_2^k+\cdots+c_p\lambda_p^k)=0$$

Finally, the long-run variance of an infinite MA process exists if the sum of its squared coefficients is finite.

$$Var[x_{t+k\rightarrow \infty}]=((c_1+c_2+\cdots+c_p)^2+(c_1\lambda_1+c_2\lambda_2+\cdots+c_p\lambda_p)^2+\cdots+(c_1\lambda_1^k+c_2\lambda_2^k+\cdots+c_p\lambda_p^k)^2)\sigma^2 < \infty $$ $$\Rightarrow \sum_{i=0}^\infty (c_1\lambda_1^i+c_2\lambda_2^i+\cdots+c_p\lambda_p^i)^2 =\sum_{i=0}^\infty (\sum_{j=1}^p c_j\lambda_j^i)^2< \infty$$

Furthermore, for the AR(p) process to be causal, the sum of absolute coefficient values is finite as well.

$$\sum_{i=0}^\infty \|\psi_i\| = \sum_{i=0}^\infty\|\sum_{j=1}^p c_j\lambda_j^i\|<\infty$$

Example: AR(1)

$$(1-\phi L)y_t= a_t $$ $$y_t = \frac{a_t}{1-\phi L}=(1+\phi L + \phi^2 L^2 + \cdots ) a_t$$ $$Var[x_{t+k \rightarrow \infty}]=(1+\phi^2+\phi^4+\cdots)\sigma^2 = \frac{\sigma^2}{1-\phi^2}$$

Assuming all characteristic roots ($1/\lambda$) fall outside the unit circle, the AR(p) process can be viewed as a weighted sum of p-stable MA processes, so a finite long-run variance must exit.

Impulse Response Function

Earlier, we used AR(p) characteristics roots and partial-fraction decomposition to derive the equivalent of an infinite order moving average representation. Alternatively, we can compute the impulse response function (IRF) and find the MA coefficients’ values.

The impulse response function describes the model output triggered by a single shock at time t.

$$a_t=\left\{\begin{matrix} 1 & {t=1}\\ 0 & {t\neq 1} \end{matrix}\right.$$ $$y_1=a_1=1$$ $$y_2=\phi_1 y_1 + a_2 =\phi_1$$ $$y_3=\phi_1 y_2 + \phi_2 y_1 + a_3=\phi_1^2 + \phi_2$$ $$y_4=\phi_1 y_3 + \phi_2 y_2 + \phi_3 y_1 $$ $$\cdots$$ $$y_{p+1}=\phi_1 y_p + \phi_2 y_{p-1} + \phi_3 y_{p-2} + \cdots + \phi_p y_1$$ $$y_{p+2}=\phi_1 y_{p+1} + \phi_2 y_{p} + \phi_3 y_{p-1} + \cdots + \phi_p y_2$$ $$\cdots$$ $$y_{p+k} = \phi_1 y_{p+k-1} + \phi_2 y_{p+k-2} + \cdots + \phi_p y_k$$

The procedure above is relatively simple (computationally) to perform, and can be carried on for any arbitrary order (i.e. k).


Recall the partial fraction decomposition we did earlier:

$$y_t = \left [ \frac{c_1}{1-\lambda_1 L}+\frac{c_2}{1-\lambda_2 L}+\cdots+\frac{c_p}{1-\lambda_p L} \right ]a_t$$

We derived the values for the MA coefficients as follows: $$y_t=[(c_1+c_2+\cdots+c_p) + (c_1\lambda_1+c_2\lambda_2+\cdots+c_p\lambda_p)L+ (c_1\lambda_1^2+c_2\lambda_2^2+\cdots+c_p\lambda_p^2)L^2+\cdots)a_t$$

In principle, the IRF values must match the MA coefficients values. So we can conclude:

  1. The sum of denominators (i.e. $c_i$) of the partial-fractions equals to one (i.e. $\sum_{i=1}^p c_i = 1$).
  2. The weighted sum of the characteristics roots equals to $\phi_1$ (i.e. $\sum_{i=1}^p c_i \lambda_i = \phi_1$).
  3. The weighted sum of the squared characteristics roots equals to $\phi_1^2+\phi_2$ (i.e. $\sum_{i=1}^p c_i \lambda_i^2 = \phi_1^2+\phi_2$ ).


Given an input data sample $\{x_1,x_2,\cdots , x_T\}$, we can calculate values of the moving average process for future (i.e. out-of-sample) values as follows:

$$y_T = \phi_1 y_{T-1} + \phi_2 y_{T-2} + \cdots + \phi_p y_{T-P} + a_T$$ $$E[y_{T+1}] = \phi_1 y_T + \phi_2 y_{T-1} + \cdots + \phi_p y_{T+1-p}$$ $$E[y_{T+2}] = \phi_1 E[y_{T+1}] + \phi_2 y_{T} + \cdots + \phi_p y_{T+2-p}$$ $$E[y_{T+2}] = (\phi_1^2 + \phi_2) y_T + (\phi_1\phi_2+\phi_3)y_{T-1}+\cdots +(\phi_1\phi_{p-1}+\phi_p)y_{T+2-p}$$

We can carry this calculation to any number of steps we wish.

Next, for the forecast error:



Article is closed for comments.

Was this article helpful?
0 out of 1 found this helpful