In this post I introduce a class of discrete stochastic volatility
models using a nice notation and go over some special cases including
GARCH and ARCH models. I show how to simulate these processes and how
parameter estimation performs. The Python code I wrote for these
experiments is referenced at the end of the post.
Discrete stochastic
volatility models
If you are not that familiar with the probability theory, do yourself
a favor and just read over the mathematical intricacies, or maybe skip
right ahead to the “special cases” section and pick the details from
here only of they seem relevant to you.
Let (Ω,A,F,P) be a stochastic basis with a complete σ-algebra A of measurable subsets of Ω, a probability measure P, and a filtration F=(Ft)t=0,1,….
Thus, the time instances are indexed using non-negative integers
0,1,….
To get the first t elements of a
sequence X=(X0,X1,…), I use
the notation (Xs)s<t=(X0,…,Xt−1).
A discrete stochastic volatility (DSV) model X=(X0,X1,…) is a real-valued stochastic
process (a sequence of random variables) satisfying the following
equations:
Z is a noise process adapted to
F, such that Zt are i.i.d. copies of a random variable
with density ηZ.
ϕi are real numbers and, to
avoid trouble, I assume that γi,λi≥0 and that gi,hi are non-negative valued.
fi,gi, and hi are deterministic functions of the
process.
The process μ is often called the
drift, whereas σ is
called the volatility of X.
Because σ is a stochastic process,
the process X as defined above belongs
to a large family of stochastic volatility models.
For a noise process Z such that the
mean and the variance of every Zt
exist, we have E[Xt∣Ft−1]=μt and Var[Xt∣Ft−1]=σt2.
Special cases
To formulate the specializations of the general DSV model the
following notation comes in handy.
The backward shift operator Bj, for
j≥0, produces a delayed version of
its argument process, i.e. Bj(X)t=Xt−j, if t−j≥0, and
Bj(X)t=0, if t−j<0. For example
B1(X)=B(X)=(0,X1,X2,…).
For convenience, I set B1=B and
Btj(X)=Bj(X)t=Xt−j.
For all the special cases in the list below, I assume that the
functions fi,gi, and hi pick a single element from the history of
the argument process, i.e. fi=hi=Bti, and gi((Xs)s<t)=Btj(X2).
GARCH process definitions found in textbooks additionally
set μ≡0.
The GARCH(1, 1) processes are quite popular, so let’s stat
the dynamics explicitly:
Xtσt2=σtZt=γ0+γ1Bt(X2)+λ1Bt(σ2).
In an ARCH process, the volatility has the simplified form
with λi=0 for all i, and μ≡0.
Xtσt2=σtZt=γ0+i=1∑qγiBti(X2).
An ARCH(1) process additionally satisfies γi=0 for all i≥2:
Xtσt2=σtZt=γ0+γ1Bt(X2).
Simulation
Discrete stochastic volatility models are typically used to model the
log-returns of an observed time-series. Therefore in order to simulate a
path of the original time-series we need to simulate its log-returns and
calculate Y=Y0∏s=0texp(Xs).
A sample path of a GARCH(1,1) process driven by the Gaussian noise
with parameters (γ0,γ1,λ1)=(0.001,0.2,0.25):
Note that the σ process for
t>0 cannot go below the level of
γ0(1+λ1)≈0.0353
Maximum-likelihood
estimation
Maximum-likelihood (ML) parameter estimation is the method of choice
for all the discussed models since the transition density, i.e. the
density of Xt given the past
information Ft−1 is known
explicitly. Log-likelihood function of the process sample path x is thus given by
lnL(θ;x)=s=1∑tlnηZ(σt(θ)xt−μt(θ))−lnσt(θ),
where θ=(ϕi,γi,λi), and ηZ is the
density of Z. Minimizing the above
log-likelihood function yields the maximum-likelihood estimate θ^ for θ:
θ^(x)=argminθlnL(θ;x).
Monte-Carlo study
In order to test the ML parameter estimation procedure, I perform the
following Monte-Carlo experiment.
Simulate 2500 independent GARCH(1,1) process paths of length 5000
with parameters (0.001, 0.2, 0.25). I used the Gaussian noise,
i.e. Zt∼N(0,1).
Feed each of these paths to the ML estimator and get the estimated
parameter vectors (θ^i)i=1,…,2500.
The search ranges for parameters in this optimization procedure are
restricted to [1e-8, 1].
Compare original θ with the
estimated θ^i.
Simulate GARCH(1,1) with parameter vectors θ^i, calculate means and standard
deviations, and compare these to the “true” mean and standard deviation
(which are 5.098 and 1.084, respectively).
As expected, the estimator θ^ is very inaccurate and, in most
cases, doesn’t even come close to the true vector θ. In particular, the estimated γ1 and λ1 are often set to zero (see the
histograms below).
On the other hand, the process means and standard deviations coming
from the estimated θ^ are much
more accurate. This is a good thing, because we usually care more about
recovering the characteristics of the unknown data generating process
and not so much about the true parameter values of the model.
Crazy idea: The Cauchy
driving noise
The noise process Z doesn’t have to
be normalized to mean 0 and variance 1. In fact, we just need to make
sure that the distribution of the random variable Zt admits a density. If this is the case,
both, process simulation and ML estimation work as described.
So how about replacing the Gaussian noise with a noise sampled from
the Cauchy distribution? In many probability theory books, the Cauchy
distribution is used a counterexample, because it has many
“pathological” properties. For example, it has no mean and,
consequently, no variance. (“No mean” means that the integral used to
define the first moment diverges, i.e. has no finite value.)
This is the theory which I learned few years ago. What I didn’t know
is how erratic samples from a Cauchy distribution look like. Just take a
look at a sample path of the GARCH(1,1) process with the parameter
vector (γ0,γ1,λ1)=(0.0001,0.001,0.01):
If you play around with the path generating function long enough, you
may even generate a floating point overflow exception. The Python
function I used to generate histograms shown above fails because of
that. To see what is going on under the hood, let’s generate some
histograms using samples from the Cauchy distribution:
This means that, for example, with probability of 0.0001 the sampled
values are greater than 3183.10. For comparison, let’s calculate the
corresponding quantiles of the standard normal distribution:
McNeil, Alexander J., Rüdiger Frey, and Paul Embrechts. Quantitative
risk management. Princeton university press, 2015. (This book contains
one short chapter nicely summarizing univariate time-series
models.)