ConvexPi

Empirical Asset Pricing via Machine Learning

Shihao Gu, Bryan Kelly, Dacheng Xiu

Review of Financial Studies · 2020 · 2223 citations

Low VolML / AIMomentum
Community wiki✎ Edit⟲ History

Empirical Asset Pricing via Machine Learning


Source: Gu, S., Kelly, B. & Xiu, D. (2020) · Review of Financial Studies 33(5), 2223–2273 · doi:10.1093/rfs/hhaa009


TL;DR

A disciplined horse race of machine-learning methods — penalized linear models (elastic net), dimension reduction (PCR, PLS), generalized linear models with splines, random forests, gradient-boosted trees, and neural networks — for measuring equity risk premiums out of sample. Trees and neural networks win, tracing their edge to nonlinear predictor interactions that other methods miss. Neural-network performance peaks at three hidden layers (not deeper). Stock-level monthly out-of-sample R² is small in absolute terms (~0.40% for the best model) but economically large: a value-weighted long-short decile spread on NN forecasts earns an annualized OOS Sharpe of 1.35, more than doubling a leading regression benchmark. All methods agree the dominant signals are momentum, liquidity, and volatility.


Problem it solves

The asset-pricing predictor space is high-dimensional and likely nonlinear, with many characteristics, interactions, and macro conditioning. Linear factor models cannot exploit interactions or nonlinearity, and naive high-dimensional regression overfits given finance's very low signal-to-noise. ML can help, but only with regularization, dimension reduction, and strict out-of-sample discipline.


The method

  • Models compared: OLS (with Huber loss), elastic net (ENet), PCR, PLS, generalized linear model with splines (GLM), random forest (RF), gradient-boosted regression trees (GBR), and feed-forward neural networks NN1–NN5 (one to five hidden layers).
  • Common objective: predict each stock's next-month excess return; methods differ in how they regularize and introduce nonlinearity. Hyperparameters tuned by validation; ensembling and robust loss used.
  • Outputs assessed: out-of-sample predictive R²_oos, value-weighted long-short decile portfolio Sharpe ratios, and variable-importance rankings; Diebold–Mariano tests compare models.

  • Assumptions & inputs

  • Data: nearly 30,000 individual US stocks over 60 years, 1957–2016.
  • Predictors: 94 firm characteristics, each interacted with 8 aggregate time-series variables, plus 74 industry dummies — totaling more than 900 baseline signals (some methods expand further via nonlinear transformations).
  • Evaluation: a recursive expanding-window scheme splitting data into training / validation / out-of-sample blocks, so all reported results are genuinely out of sample.

  • How to use it

  • Prefer shallow neural nets (NN3 is best here) and ensemble trees over both linear models and very deep networks; the gains come from interactions plus honest OOS tuning, not model size.
  • Headline figures: NN R²_oos rises from 0.33% (NN1) to a peak of 0.40% (NN3); RF/GBR ~0.33–0.34%; ENet ~0.11%, PCR/PLS ~0.26–0.27%. NN long-short decile spread Sharpe 1.35; NN-timed S&P 500 Sharpe 0.77 vs 0.51 buy-and-hold.
  • Interpret models via variable importance: momentum, liquidity, and volatility variables dominate across all methods.

  • Limitations & pitfalls

  • Absolute predictive R² is tiny — returns are inherently low-signal — and gains are bounded; results are gross of trading costs.
  • Performance depends on careful regularization and the expanding-window protocol; naive deep nets or in-sample tuning overfit.
  • Variable importance shows which signals matter, not economic mechanism; structural/causal interpretation is limited.

  • Key references

  • Gu, S., Kelly, B. & Xiu, D. (2020) — Empirical Asset Pricing via Machine Learning — Review of Financial Studies
  • Kelly, B., Pruitt, S. & Su, Y. (2019) — Characteristics Are Covariances (IPCA) — Journal of Financial Economics
  • Kozak, S., Nagel, S. & Santosh, S. (2020) — Shrinking the Cross-Section — Journal of Financial Economics



  • Provenance: verified/generated from the paper's full text.


    Community-maintained wiki — anyone can suggest an edit or view its revision history. Not peer-reviewed; verify claims against the original paper.

    Wiki last updated: June 22, 2026