Volatility 3: Autoregressive Conditional Heteroscedasticity (ARCH)

This is the third entry in our ongoing series on volatility modeling. In an earlier issue, we introduced the broad concept of volatility in financial time series, defined its general characteristics (e.g. clustering, mean-reversion) and identified important volatility terms in financial time series. We explored holding periods, volatility scaling, the serial correlation assumption, multi-period volatility (i.e. term structure) and discussed a few non-parametric methods to estimate volatility using historical data.

In this issue, we’ll take the prior discussion further and develop an understanding of autoregressive conditional heteroscedasticity (ARCH) volatility modeling

Why should you care?

Once again, the concepts discussed here are pivotal to a solid understanding of financial time series volatility.


Let’s consider the dependent asset’s log return $\{r_t\}$ time series. First, let’s model the return as the sum of two components:



  • $\mu_t$ is the conditional mean (non-stochastic) component
  • $a_t$ is the innovation, error-term or shock (stochastic) component
  • $\mathrm{E}[a_t]=0$
  • $\mathrm{Var}[r_t]=\mathrm{Var}[a_t]=\sigma_t^2$
  • $\mathrm{Cov}[a_t,a_{t+j}]=0$

To compute the multi-period volatility:

$$\mathrm{Var}[r_{t+k}]=\mathrm{Var}[\sum_{i=1}^k r_{t+i}]=\sum_{i=1}^k \mathrm{Var}[r_{t+i}]+2\times \sum_{i=1}^{k-1}\sum_{j=i+1}^k \mathrm{Cov}[r_{t+i},r_{t+j}]$$


$$\mathrm{Cov}[r_{t+i},r_{t+j}]=E[(r_{t+i}-\mu)(r_{t+j}-\mu)]=E[r_{t+i}\times r_{t+j}]- \mu (\mu_{t+i}+\mu_{t+j})+\mu^2 $$


  • $\mu$ is the unconditional (long-run) mean of the time series


$$E[r_{t+i}\times r_{t+j}]=E[(\mu_{t+i}+a_{t+i})(\mu_{t+j}+a_{t+j})]=\mu_{t+i}\times\mu_{t+j}$$


$$\mathrm{Cov}[r_{t+i},r_{t+j}]=\mu_{t+i}\times\mu_{t+j} -\mu (\mu_{t+i}+\mu_{t+j})+\mu^2$$ $$\mathrm{Cov}[r_{t+i},r_{t+j}]=(\mu_{t+i}-\mu)(\mu_{t+j}-\mu)$$

In sum, the covariance is a function of the conditional mean and unconditional mean.

The multi-period volatility (term structure) depends not only on the conditional volatility of each period, but on the conditional mean as well.

Assumption 1:

Let's assume that $\mu_t=\mu_{t+1}=\mu_{t+2}=\cdots=\mu$

$$r_t=\mu + a_t$$ $$r_t-\mu=\hat r_t=a_t$$



Note: The assumption is not contrary to what we see in financial time series, as the majority of time series don’t possess significant mean or any serial correlation; at the same time, they all exhibit time-varying volatility (heteroskedasticity).

Next, let’s assume the innovation, error term or shock term can be represented as follows:

$$a_t=\sigma_t \times \epsilon_t $$


  • $\sigma_t$ is the conditional volatility (scalar) at time t
  • $\epsilon_t$ is the random variable with zero mean ($E[\epsilon_t]=0$), and unit variance ($\mathrm{Var}[\epsilon_t]=E[\epsilon_t^2]=1$)
  • $\epsilon_t$ is serially uncorrelated, but still dependent (higher-order)

Assumption 2:

let’s assume the random variable ($\epsilon_t$) is identically distributed over time (not necessarily Gaussian).


Let’s define $Z_t$ as the squared of the mean-adjusted asset’s returns:



$$ E[Z_t]=\sigma_t^2$$

Let’s now examine the methods used in earlier issues to estimate volatility:

1. Equal weighted moving standard deviation

$$\sigma_t^2=\frac{\sum_{i=1}^m (r_{t-i}-\mu)^2}{m-1}=\frac{1}{m-1}\sum_{i=1}^m a_{t-i}^2=\frac{1}{m-1}\sum_{i=1}^m Z_{t-i}$$

2. Exponentially weighted moving average (EWMA)

$$\sigma_{t+1}^2=\lambda \sigma_t^2+(1-\lambda)a_t^2=(1-\lambda)\sum_{i=0}^\infty \lambda^i a_{t-i}^2=(1-\lambda)\sum_{i=0}^\infty \lambda^i Z_{t-i}$$

Probability Distribution of $Z_t$

So far, we have not assumed any functional form for the probability distribution of $Z_t$, but here’s a few observations about the candidate distribution function:

  1. $P(Z\lt 0)=0$ (i.e. $Z_t \ge 0$
  2. $P(Z)$ is asymmetric
  3. $P(Z)$ is positively (aka right) skewed
  4. $P(Z)$ is the distribution of the squared innovations or shocks


$$a_t \sim \eta (.)$$

Let $\eta (.)$ be the probability density function with zero mean and $\sigma_t^2$ variance.

$$P(Z\le z)=H(a_t\le \sqrt{z})+(1-H(a_t\le -\sqrt{z}))$$ $$p(z)=\frac{\partial P}{\partial z}=\frac{\eta(\sqrt{z})+\eta(-\sqrt{z})}{2\sqrt{z}}$$

Assume $\eta (.)$ is a symmetrical distribution


Case 1

Let’s use the standardized residuals of a Gaussian distribution.

$$\epsilon_t \sim \Phi(0,1)=\frac{1}{\sqrt{2\pi}}e^{-\epsilon_t^2/2}$$ $$Z_t =\epsilon_t^2\sim \eta(.)=\frac{1}{\sqrt{2\pi Z_t}}e^{-Z_t/2}=\chi_{\nu=1}^2(Z_t)$$

The distribution of the squared values of a Gaussian distributed random variable is Chi-square with one degree of freedom.

$$a_t=\sigma_t\times\epsilon_t$$ $$\mathrm{E}[a_t^2]=\sigma_t^2\times\mathrm{E}[\epsilon_t^2]=\sigma_t^2\times\nu=\sigma_t^2$$ $$\textrm{Var}[a_t^2]=\sigma_t^4\times \textrm{Var}[\epsilon_t^2]=\sigma_t^4\times 2\nu=2\sigma_t^4$$

Case 2

Case 2: Let’s use the standardized residuals of the student’s t distribution.

$$p(\epsilon_t)=\frac{\Gamma {(\frac{\nu+1}{2})}}{\sqrt{\nu\pi}\times\Gamma{(\nu/2)}}\left (1+\frac{(\nu-1)\epsilon_t^2}{\nu^2}\right )^{-\frac{\nu+1}{\nu}}$$ $$p(Z_t=\epsilon_t^2)=\frac{\Gamma {(\frac{\nu+1}{2})}}{\sqrt{\nu\pi Z_t}\times\Gamma{(\nu/2)}}\left( 1+\frac{(\nu-1)Z_t}{\nu^2} \right )^{-\frac{\nu+1}{\nu}}$$

Alright, the equation is getting a bit complicated here, but the general principles are applicable to how we related the conditional probability distribution of the squared time series with the original conditional probability distribution.

$\{a_t^2\}$ Modeling

So far, we have examined the squared innovations properties (i.e. at a single time instance) at a given time instance (i.e. t), but how do we describe the $\{ Z_t=a_t^2\}$evolution over time?

Similar to ARMA/ARIMA modeling, we examine the ACF/PACF correlogram for the squared time series in an attempt to identify a dependency between lagged time series, and propose a model.

There are two main categories of statistical volatility models:

  • Deterministic form - exact function to govern the evolution of volatility (e.g. ARCH, GARCH, EGARCH, etc.)
    $$\sigma_t^2=f(\sigma_t^2 | F_{t-1})$$
  • Stochastic form – use of a stochastic equation, i.e. allowing a innovation/shock term in the volatility equation (e.g. stochastic volatility model
    $$\sigma_t^2=f(\sigma_t^2 | F_{t-1}) +\eta_t$$

Note: The conditional volatility values are computed indirectly, not directly observed, which further complicates the process.

Autoregressive Conditional Heteroskedasticity (ARCH) Model

The first model that provides a systematic framework for volatility modeling is Engle’s autoregressive conditional heteroskedasticity (ARCH) model (1982). This is a good model to start with due to its simplicity and relevance to other models.

$$r_t-\mu=a_t=\sigma_t\times\epsilon_t$$ $$\sigma_t^2=\alpha_o+\alpha_1 a_{t-1}^2+\alpha_2 a_{t-2}^2 + \cdots +\alpha_m a_{t-m}^2=\alpha_o+\sum_{i=1}^m \alpha_i a_{t-i}^2$$ $$\epsilon_t \sim \mathrm{i.i.d}\sim \Phi(0,1)$$

The autoregressive conditional heteroskedasticity (ARCH) model is an AR(p) for $\{Z_t=a_t^2\}$ times series, but without the error terms or the shocks. Alternatively, we can view the ARCHmodel as a weighted moving average of the squared time series (WMA) with a constant.

The coefficient’s value must meet some regulatory requirement to ensure that (1) conditional variance is always positive, and (2) the unconditional variance is finite and positive.

$$\sigma_t^2-\alpha_o=\sum_{i=1}^m \alpha_i a_{t-i}^2$$

Volatility clustering: the ARCH model captures the volatility clustering observed in assets returns: a large past-squared shock $\{a_t^2\}_{i=1}^m$ implies a large conditional variance ($\sigma_t^2$ ) for the mean-corrected return $a_t$ . Consequently, tends to be followed by a large value (in absolute terms) due to the large variance, and vice versa for smaller shocks.

ARCH(1) Model

$$\sigma_t^2=\alpha_o+\alpha_1 a_{t-1}^2$$ $$\sigma_{t+1}^2=\alpha_o+\alpha_1 \mathrm{E}[a_t^2]=\alpha_o+\alpha_1 \sigma_t^2$$ $$\sigma_{t+2}^2=\alpha_o+\alpha_1 \mathrm{E}[a_{t+1}^2]=\alpha_o+\alpha_1\times (\alpha_o+\alpha_1 \sigma_t^2)=\alpha_o+\alpha_1\alpha_o+\alpha_1^2\sigma_t^2$$ $$\sigma_{t+3}^2=\alpha_o+\alpha_1\alpha_o+\alpha_1^2\alpha_o+\alpha_1^3 \sigma_t^2$$ $$ \sigma_{t+k}^2=\alpha_o+\alpha_1\alpha_o+\alpha_1^2\alpha_o+\cdots+\alpha_o\alpha_1^{k-1}+\alpha_1^k \sigma_t^2 $$ $$\cdots$$ $$\sigma_{t+k\to\infty}^2=\frac{\alpha_o}{1-\alpha_1}$$

Thus, for a positive conditional variance and a finite unconditional variance, then $\alpha_o\succ 0$ and $\alpha_1 \prec 1$ .

Model's Parameters

In the earlier issue (volatility 101), we did not assume any distribution for the time series and thus used the root mean square error as our utility function, searching for a set of parameters’ values that minimize the RMSE.

In the ARCH model, the mean-corrected returns ( ) are inter-dependent (e.g. clustering) and are not identically distributed, so how do we go about estimating an efficient set of values for its parameters?

$$\epsilon_t=\frac{a_t}{\sigma_t}\sim N(0,1)$$ $$\epsilon_t\sim \mathrm{i.i.d}$$


$$LF(a_T,a_{T-1},\cdots,a_1 | \alpha_o , \alpha_1 ,\cdots,\alpha_m)=\prod_{t=m+1}^T \frac{1}{\sqrt{2\pi\sigma_t^2}}e^{-a_t^2/{2\sigma_t^2}}$$ $$LLF(a_T,a_{T-1},\cdots,a_1 | \alpha_o , \alpha_1 ,\cdots,\alpha_m)=\sum_{t=m+1}^T \frac{1}{2}\ln{2\pi}-\frac{a_t^2}{2\sigma_t^2}=-\frac{1}{2}\left(\sum_{t=m+1}^T \ln{2\pi}+\ln{\sigma_t^2}+\frac{a_t^2}{\sigma_t^2}\right )$$

For an initial set of $\{\alpha_o,\alpha_1,\cdots,\alpha_m\}$, we recursively compute the conditional volatility values and revise the alpha values in an effort to maximize the overall likelihood.

Model Checking

The Autoregressive conditional heteroskedasticity (ARCH) model does not assume i.i.d assumption among the mean-corrected returns $\{a_t\}$, but the standardized residuals $\{\epsilon_t\}$ are i.i.d.

$$\epsilon_t=\frac{a_t}{\sigma_t}\sim N(0,1)$$ $$\epsilon_t\sim\mathrm{i.i.d}$$

In short, we need to examine the standardized residuals $\{\epsilon_t\}$ for independence (e.g. the white-noise test and arch effect test) and the normality distribution assumption.

Model Extension

In some applications, it is more appropriate to assume that standardized residuals $\{\epsilon_t\}$ follow a heavy –tailed distribution such as the student’s t distribution or the generalized error distribution (GED).

This extension affects the computation of the log-likelihood function (LLF) (using the alternative probability density function), and the interpretation of conditional volatility.

To illustrate, let’s take the student’s t-distribution for $\{\epsilon_t\}$

$$\epsilon_t\sim\mathrm{i.i.d}\sim t_\nu (0,1)$$


  • $t_\nu (0,1)$ is the standardized student’s t distribution (zero mean and unit variance)
  • $\nu$ is the degrees of freedom of the student’s distribution ($\nu \succ 2$)

To yield a standardized t-distribution, with zero skew and finite excess kurtosis;

  1. $\nu \succ 4 $
  2. $\epsilon = t \times \frac{\nu-2}{\nu}$

The standardized t-distribution exhibit a fat tail (heteroskedasticity) with excess kurtosis = $6/(\nu-4)$.


For the first p-steps out-of-sample, the forecast formula includes a mix of squared residuals $\{a_t^2\}$ and estimated variances $\hat\sigma_t^2$.

$$\sigma_{t+1}^2=\alpha_o+\alpha_1 a_{t}^2+\cdots+\alpha_p a_{t-p+1}^2$$ $$\sigma_{t+2}^2=\alpha_o+\alpha_1 \times E[a_{t+1}^2]+\alpha_2 a_t^2+\cdots+\alpha_p a_{t-p+2}^2$$ $$\sigma_{t+2}^2=\alpha_o+\alpha_1 \times \sigma_{t+1}^2+\alpha_2 a_t^2+\cdots+\alpha_p a_{t-p+2}^2$$ $$\sigma_{t+3}^2=\alpha_o+\alpha_1 \times \sigma_{t+2}^2+\alpha_2 \sigma_{t+1}^2+ \alpha_3 a_t^2+\cdots+\alpha_p a_{t-p+3}^2$$ $$ \cdots $$ $$\sigma_{t+p}^2=\alpha_o+\alpha_1 \sigma_{t+p-1}^2+\alpha_2\sigma_{t+p-2}^2 + \cdots +\alpha_p\sigma_{t}^2$$ $$\sigma_{t+p+1}^2=\alpha_o+\alpha_1 \sigma_{t+p}^2+\alpha_2\sigma_{t+p-1}^2 + \cdots +\alpha_p\sigma_{t+1}^2 $$

For a longer forecast horizon, the estimated conditional volatility converge to a long term value determined by the model parameters; for instance ARCH(1) has the following long-run variance:

$$\sigma_{t+k\to\infty}^2=\frac{\alpha_o}{1-\alpha_1}$$ ARCH Effect

An ARCH effect is a characteristic used to describe whether a given time series exhibits correlation among its squared data point values.

The original test conducted by Engle (1982) is using the LaGrange multiplier (LM) and ordinary least squares regression.

Alternatively, we can use the Ljung-Box test on the squared (mean-adjusted) time series, compute modified Q(m) and test whether the data exhibits a significant serial correlation or not.


In this paper, we built upon several bedrock lessons from earlier issues and constructed a general framework for volatility modeling. In the beginning, we looked into correlated returns and derived a relationship between term structure (i.e. multi-period) volatility and conditional means.

Next, assuming a mean-adjusted asset’s returns time series, we proceed with our analysis to volatility. In practice, the models describing volatility evolution over time are categorized into two groups: (1) deterministic functional form based models (e.g. ARCH, GARCH, etc.), and (2) stochastic models which permit the volatility model to include a shock/innovation term.

Finally, we examined in depth a rather important model – the autoregressive conditional heteroskedasticity (ARCH) model (Engle 1980); a building block mode for many models (e.g. GARCH, EGARCH, etc.), which we will cover in future issues. The ARCH can be thought of as a weighted moving average of the squared time series, but the weights are relatively constrained to yield a positive variance and (existent) finite long-run variance. Nevertheless, it does not provide an insight into the volatility process and treats positive and negative shock indiscriminately, which is contrary to what has been observed/documented in financial time series.


The PDF version of this issue along with the excel spreadsheet can be found below:

Have more questions? Submit a request