GARCH simulation and estimation from scratch

2019-11-01

In this post I introduce a class of discrete stochastic volatility models using a nice notation and go over some special cases including GARCH and ARCH models. I show how to simulate these processes and how parameter estimation performs. The Python code I wrote for these experiments is referenced at the end of the post.

Discrete stochastic volatility models

If you are not that familiar with the probability theory, do yourself a favor and just read over the mathematical intricacies, or maybe skip right ahead to the “special cases” section and pick the details from here only of they seem relevant to you.

\global\def\Xpred#1{(X_s)_{s\lt #1}}

Let (Ω,A,F,P)\left( \Omega, \mathcal A, \mathcal F, P \right) be a stochastic basis with a complete σ\sigma-algebra A\mathcal A of measurable subsets of Ω\Omega, a probability measure PP, and a filtration F=(Ft)t=0,1,.\mathcal F = (\mathcal F_t)_{t=0, 1, \ldots}.

Xt=μt+σtZtμt=ϕ0+i=1pϕifi((Xs)s<t)σt2=γ0+i=1qγigi((Xs)s<t)+i=1rλihi((σs2)s<t). \begin{aligned} X_t &= \mu_t + \sigma_t Z_t \\ \mu_t &= \phi_0 + \sum_{i=1}^{p} \phi_i f_i (\Xpred{t}) \\ \sigma_t^2 &= \gamma_0 + \sum_{i=1}^q \gamma_i g_i(\Xpred{t}) + \sum_{i=1}^r \lambda_i h_i((\sigma_s^2)_{s\lt t}). \end{aligned}

Ingredients:

Special cases

To formulate the specializations of the general DSV model the following notation comes in handy.

The backward shift operator BjB^j, for j0j\geq 0, produces a delayed version of its argument process, i.e. Bj(X)t=XtjB^j (X)_t = X_{t-j}, if tj0t-j \geq 0, and Bj(X)t=0B^j (X)_t = 0, if tj<0t-j \lt 0. For example

B1(X)=B(X)=(0,X1,X2,). B^1(X) = B(X) = (0, X_1, X_2, \ldots).

For convenience, I set B1=BB^1 = B and Btj(X)=Bj(X)t=Xtj.B^j_t (X) = B^j(X)_t = X_{t-j}.

For all the special cases in the list below, I assume that the functions fi,gif_i, g_i, and hih_i pick a single element from the history of the argument process, i.e. fi=hi=Btif_i = h_i = B^i_t, and gi((Xs)s<t)=Btj(X2).g_i(\Xpred{t}) = B^j_t(X^2).

GARCH process definitions found in textbooks additionally set μ0\mu\equiv 0.

The GARCH(1, 1) processes are quite popular, so let’s stat the dynamics explicitly:

Xt=σtZtσt2=γ0+γ1Bt(X2)+λ1Bt(σ2). \begin{aligned} X_t &= \sigma_t Z_t \\ \sigma_t^2 &= \gamma_0 + \gamma_1 B_t(X^2) + \lambda_1 B_t(\sigma^2). \end{aligned}

In an ARCH process, the volatility has the simplified form with λi=0\lambda_i = 0 for all ii, and μ0\mu \equiv 0.

Xt=σtZtσt2=γ0+i=1qγiBti(X2). \begin{aligned} X_t &= \sigma_t Z_t \\ \sigma_t^2 &= \gamma_0 + \sum_{i=1}^q \gamma_i B^i_t(X^2). \end{aligned}

An ARCH(1) process additionally satisfies γi=0\gamma_i = 0 for all i2i \geq 2:

Xt=σtZtσt2=γ0+γ1Bt(X2). \begin{aligned} X_t &= \sigma_t Z_t \\ \sigma_t^2 &= \gamma_0 + \gamma_1 B_t(X^2). \end{aligned}

Simulation

Discrete stochastic volatility models are typically used to model the log-returns of an observed time-series. Therefore in order to simulate a path of the original time-series we need to simulate its log-returns and calculate Y=Y0s=0texp(Xs).Y = Y_0 \prod_{s=0}^t \exp(X_s).

A sample path of a GARCH(1,1) process driven by the Gaussian noise with parameters (γ0,γ1,λ1)=(0.001,0.2,0.25)(\gamma_0, \gamma_1, \lambda_1) = (0.001, 0.2, 0.25):

Note that the σ\sigma process for t>0t>0 cannot go below the level of γ0(1+λ1)0.0353\sqrt{\gamma_0 (1 + \lambda_1)} \approx 0.0353

Maximum-likelihood estimation

Maximum-likelihood (ML) parameter estimation is the method of choice for all the discussed models since the transition density, i.e. the density of XtX_t given the past information Ft1\mathcal F_{t-1} is known explicitly. Log-likelihood function of the process sample path xx is thus given by

lnL(θ;x)=s=1tlnηZ(xtμt(θ)σt(θ))lnσt(θ), \ln L(\theta; x) = \sum_{s=1}^t \ln \eta_Z \left( \frac{x_t - \mu_t(\theta)}{\sigma_t(\theta)} \right) - \ln \sigma_t(\theta),

where θ=(ϕi,γi,λi)\theta = (\phi_i, \gamma_i, \lambda_i), and ηZ\eta_Z is the density of ZZ. Minimizing the above log-likelihood function yields the maximum-likelihood estimate θ^\hat\theta for θ\theta:

θ^(x)=argminθlnL(θ;x). \hat\theta(x) = \operatorname{argmin}_{\theta} \ln L(\theta; x).

Monte-Carlo study

In order to test the ML parameter estimation procedure, I perform the following Monte-Carlo experiment.

As expected, the estimator θ^\hat\theta is very inaccurate and, in most cases, doesn’t even come close to the true vector θ\theta. In particular, the estimated γ1\gamma_1 and λ1\lambda_1 are often set to zero (see the histograms below).

On the other hand, the process means and standard deviations coming from the estimated θ^\hat\theta are much more accurate. This is a good thing, because we usually care more about recovering the characteristics of the unknown data generating process and not so much about the true parameter values of the model.

Crazy idea: The Cauchy driving noise

The noise process ZZ doesn’t have to be normalized to mean 0 and variance 1. In fact, we just need to make sure that the distribution of the random variable ZtZ_t admits a density. If this is the case, both, process simulation and ML estimation work as described.

So how about replacing the Gaussian noise with a noise sampled from the Cauchy distribution? In many probability theory books, the Cauchy distribution is used a counterexample, because it has many “pathological” properties. For example, it has no mean and, consequently, no variance. (“No mean” means that the integral used to define the first moment diverges, i.e. has no finite value.)

This is the theory which I learned few years ago. What I didn’t know is how erratic samples from a Cauchy distribution look like. Just take a look at a sample path of the GARCH(1,1) process with the parameter vector (γ0,γ1,λ1)=(0.0001,0.001,0.01)(\gamma_0, \gamma_1, \lambda_1) = (0.0001, 0.001, 0.01):

If you play around with the path generating function long enough, you may even generate a floating point overflow exception. The Python function I used to generate histograms shown above fails because of that. To see what is going on under the hood, let’s generate some histograms using samples from the Cauchy distribution:

The Cauchy distribution has the quantile function

Q(p)=tan(π(p0.5)). Q(p) = \tan (\pi (p-0.5)).

Evaluating Q(1p)Q(1-p) for p=0.01,0.001,0.0001p=0.01, 0.001, 0.0001 gives

Q(0.99) ~= 31.82
Q(0.999) ~= 318.31
Q(0.9999) ~= 3183.10

This means that, for example, with probability of 0.0001 the sampled values are greater than 3183.10. For comparison, let’s calculate the corresponding quantiles of the standard normal distribution:

> from scipy.stats import norm
> print(norm.ppf(0.99))
2.32
> print(norm.ppf(0.999))
3.09
> print(norm.ppf(0.9999))
3.71

References and Links