Shrinking the Cross-Section

Source: Kozak, S., Nagel, S. & Santosh, S. (2020). Journal of Financial Economics 135(2), 271–292. (NBER WP #24070, 2017)

TL;DR

Estimates the stochastic discount factor (SDF) from a large set of characteristic-based factors using economically-motivated shrinkage — a Bayesian prior that penalizes SDFs implying implausibly high (near-arbitrage) Sharpe ratios. The central message: a characteristics-sparse SDF (a handful of anomalies/factors) cannot summarize the cross-section, but a low-dimensional SDF built from a few leading principal components of the candidate factors can. The zoo is low-rank, not sparse.

Problem it solves

Empirical asset pricing keeps expanding small factor models (FF3 → q4 → FF5 → 6 factors) as new anomalies appear, but these are tested only against a few portfolios. With dozens to hundreds of characteristics, estimating the SDF's loadings is a high-dimensional problem where OLS overfits and naive variable selection ("which 3 anomalies matter?") is statistically and economically unjustified (present-value/q-theory logic implies many characteristics should matter).

The method

Build candidate long-short factors from many characteristics; estimate SDF coefficients b that minimize pricing errors subject to an economic prior.

The prior penalizes the maximum squared Sharpe ratio implied by the SDF, which shrinks the contribution of low-variance principal components of the factors. Mechanically this resembles ridge (L2) regression but rooted in no-near-arbitrage rather than an arbitrary penalty.

Optional L1 (Lasso) sparsity and a combined L1+L2 (elastic-net) "dual-penalty" specification to test whether sparsity helps.

Penalty strength chosen to maximize the cross-validated cross-sectional out-of-sample R².

Assumptions & inputs

Inputs: a panel of characteristic-based factor (managed-portfolio) returns; the prior hyperparameter (root expected SR), set by cross-validation; choice of K PCs.

Linear SDF in the chosen factors; no-near-arbitrage motivates the prior.

Datasets: Fama–French 25 size/BM portfolios (1926–2016) as a sanity check; 50 well-known anomaly portfolios; ~80 portfolios from WRDS financial ratios and lagged returns; plus extremely high-dimensional extensions adding powers and interactions of characteristics.

How to use it / findings

L2-only shrinkage delivers the best (or near-best) OOS performance in the space of base characteristics — a natural default when sparsity is not required.

L1-only (pure Lasso) often struggles OOS in high-dimensional base-characteristic spaces; sparsity in raw characteristics is limited even with the dual penalty.

In PC space, a sparse SDF on a few leading PCs does very well — even one PC gets close to the maximum OOS R² on FF25; a handful suffices on the larger datasets.

Heavy shrinkage is essential: unregularized SDFs overfit badly; the data "call for substantial L2-shrinkage but essentially no sparsity."

Limitations & pitfalls

PCs are statistical and rotate with the input factor set; the economic prior (max-SR penalty) is a modeling choice and its hyperparameter must be tuned.

Linear SDF in the chosen portfolios (nonlinear/deep-learning extensions follow in later work).

OOS R²s are cross-sectional and evaluated over long withheld windows; magnitudes depend on the portfolio universe.

Key references

Kozak, S., Nagel, S. & Santosh, S. (2020) — Shrinking the Cross-Section — Journal of Financial Economics

Kozak, S., Nagel, S. & Santosh, S. (2018) — Interpreting Factor Models — Journal of Finance

Kelly, B., Pruitt, S. & Su, Y. (2019) — Characteristics Are Covariances — Journal of Financial Economics

Provenance: verified/generated from the paper's full text.