ConvexPi

Forest through the Trees: Building Cross‐Sections of Stock Returns

SVETLANA BRYZGALOVA, MARKUS PELGER, JASON ZHU

The Journal of Finance · 2025 · 148 citations

Factor Zoo
Community wiki✎ Edit⟲ History

Forest through the Trees: Building Cross-Sections of Stock Returns


Source: Bryzgalova, S., Pelger, M. & Zhu, J. (working paper, this version May 2020; published Journal of Finance 2023)


TL;DR

Uses decision trees to build better test-asset portfolios for asset pricing. Instead of conventional one-, two-, or three-way characteristic sorts, "Asset-Pricing Trees (AP-Trees)" recursively split stocks on many characteristics jointly and conditionally, then prune the large set of resulting node portfolios down to a small, interpretable basis (e.g. 10–50 assets) that spans the SDF projected on those characteristics. The resulting cross-sections have, on average, ~30% higher Sharpe ratios and alphas than the leading reduced-form portfolios (and up to ~2x relative to decile sorts), all out-of-sample.


Problem it solves

The choice of test assets is first-order but neglected. Standard sorts capture only univariate or low-order interactions, miss conditional/nonlinear structure, and present "too low a hurdle" for candidate factor models. Going to many characteristics at once via full sorts is infeasible — the number of cells grows exponentially with depth (e.g. 10 characteristics at depth 4 explode combinatorially), and most cells are empty or noisy.


The method

  • Build AP-Trees: a sequence of conditional 50/50 splits on characteristics (e.g. size, then value, then size again) down to a chosen depth d; the nodes are managed (conditional) portfolios reflecting characteristic interactions. The collection of all trees forms a high-dimensional set of candidate basis portfolios.
  • Prune / regularize: generalize the robust SDF-recovery of Kozak, Nagel & Santosh (2020) using dual shrinkage in the mean and variance (an elastic-net-style mean-variance objective) to select a sparse, well-diversified, interpretable subset of node portfolios that maximizes the out-of-sample spanned Sharpe ratio.
  • Main results use up to 10 characteristics and trees up to depth 4, retaining small cross-sections of ~10–50 portfolios.

  • Assumptions & inputs

  • A panel of stock returns and a chosen set of firm characteristics (CRSP/Compustat-style data).
  • Tuning choices: number of desired portfolios, tree depth / degree of interactions, and minimum-size or market-cap restrictions on nodes.
  • Evaluation is framed out-of-sample, in the spirit of Martin & Nagel (2019), focusing on feasible OOS Sharpe ratios and SDF alphas not spanned by traditional factors.

  • How to use it

    Construct AP-Tree cross-sections as test assets to evaluate factor models more sharply, or as investable basis portfolios (long-only, well-diversified, interpretable). Built from the same signals as conventional sorts, they retain interpretability while capturing conditional information those sorts discard.


    Limitations & pitfalls

  • Results depend on tuning (depth, number of portfolios, shrinkage strength, node-size constraints).
  • As with any test-asset choice, conclusions about which factor models "pass" depend on the basis constructed.

  • Key references

  • Bryzgalova, S., Pelger, M. & Zhu, J. — Forest through the Trees — Journal of Finance (2023)
  • Kozak, S., Nagel, S. & Santosh, S. (2020) — Shrinking the Cross-Section — Journal of Financial Economics
  • Martin, I. & Nagel, S. (2019) — Market Efficiency in the Age of Big Data — working paper



  • Provenance: verified/generated from the paper's full text.


    Community-maintained wiki — anyone can suggest an edit or view its revision history. Not peer-reviewed; verify claims against the original paper.

    Wiki last updated: June 24, 2026