A Forecast Comparison of Volatility Models: Does Anything Beat a GARCH(1,1)?

Source: Hansen, P. R. & Lunde, A. (2005) · Journal of Applied Econometrics 20(7), 873–889 · DOI 10.1002/jae.800

TL;DR

Compares 330 ARCH/GARCH-type models out of sample on one-day-ahead conditional-variance forecasts. For DM-$ exchange rates, nothing reliably beats the simple GARCH(1,1); for IBM equity returns, GARCH(1,1) is clearly inferior to models that allow a leverage effect (asymmetry), with the best overall performance from the A-PARCH(2,2) of Ding, Granger & Engle (1993). A landmark demonstration that complexity helps only when it adds the right feature — here, asymmetry — and that the test you use matters.

Problem it solves

The ARCH/GARCH literature spawned hundreds of model variants. The paper asks, honestly and with multiple-comparison control, whether any of them beat a parsimonious GARCH(1,1) benchmark out of sample.

The method

Estimate 330 models spanning the ARCH/GARCH family on daily DM-$ exchange rate data and daily IBM returns; IBM forecasts are evaluated against a new realized-variance proxy built from intraday data.

Evaluate one-day-ahead forecasts using six different loss functions (e.g., MSE, QLIKE, R2LOG), substituting realized variance for the latent conditional variance.

Benchmark all 330 models against GARCH(1,1) using the Superior Predictive Ability (SPA) test of Hansen (2001) and White's (2000) Reality Check (RC), both of which control for data-snooping across the full model set.

Assumptions & inputs

Requires a volatility proxy; the realized-variance proxy (vs. noisy squared returns) is essential for the IBM analysis.

Conclusions are conditioned on these two series, the estimation/evaluation split, and the choice among the six loss functions.

How to use it

As a model-selection discipline: justify added volatility-model complexity with multiple-comparison-aware tests, not in-sample fit.

Findings to carry forward: asymmetry matters for equities, not for FX; the best overall model was A-PARCH(2,2).

Use SPA over the Reality Check — the paper shows the RC lacks power: it fails to detect that GARCH(1,1) is significantly outperformed on IBM and even suggests an ARCH(1) might not be beaten, whereas the SPA test always rejects ARCH(1).

Limitations & pitfalls

Results depend on the realized-volatility proxy and the loss function used.

Conclusions pertain to these two series and sample periods; generalization requires care.

A no-rejection for FX is "no evidence against GARCH(1,1)," not proof of its optimality.

Key references

Hansen, P. & Lunde, A. (2005) — A Forecast Comparison of Volatility Models — Journal of Applied Econometrics

Hansen, P. (2005) — A Test for Superior Predictive Ability — Journal of Business & Economic Statistics

Glosten, L., Jagannathan, R. & Runkle, D. (1993) — On the Relation between the Expected Value and the Volatility of the Nominal Excess Return on Stocks — Journal of Finance

Provenance: verified/generated from the paper's full text.