Forest through the Trees: Building Cross-Sections of Stock Returns
Source: Bryzgalova, S., Pelger, M. & Zhu, J. (working paper, this version May 2020; published Journal of Finance 2023)
TL;DR
Uses decision trees to build better test-asset portfolios for asset pricing. Instead of conventional one-, two-, or three-way characteristic sorts, "Asset-Pricing Trees (AP-Trees)" recursively split stocks on many characteristics jointly and conditionally, then prune the large set of resulting node portfolios down to a small, interpretable basis (e.g. 10–50 assets) that spans the SDF projected on those characteristics. The resulting cross-sections have, on average, ~30% higher Sharpe ratios and alphas than the leading reduced-form portfolios (and up to ~2x relative to decile sorts), all out-of-sample.
Problem it solves
The choice of test assets is first-order but neglected. Standard sorts capture only univariate or low-order interactions, miss conditional/nonlinear structure, and present "too low a hurdle" for candidate factor models. Going to many characteristics at once via full sorts is infeasible — the number of cells grows exponentially with depth (e.g. 10 characteristics at depth 4 explode combinatorially), and most cells are empty or noisy.
The method
Build AP-Trees: a sequence of conditional 50/50 splits on characteristics (e.g. size, then value, then size again) down to a chosen depth d; the nodes are managed (conditional) portfolios reflecting characteristic interactions. The collection of all trees forms a high-dimensional set of candidate basis portfolios.
Prune / regularize: generalize the robust SDF-recovery of Kozak, Nagel & Santosh (2020) using dual shrinkage in the mean and variance (an elastic-net-style mean-variance objective) to select a sparse, well-diversified, interpretable subset of node portfolios that maximizes the out-of-sample spanned Sharpe ratio.
Main results use up to 10 characteristics and trees up to depth 4, retaining small cross-sections of ~10–50 portfolios.
Assumptions & inputs
A panel of stock returns and a chosen set of firm characteristics (CRSP/Compustat-style data).
Tuning choices: number of desired portfolios, tree depth / degree of interactions, and minimum-size or market-cap restrictions on nodes.
Evaluation is framed out-of-sample, in the spirit of Martin & Nagel (2019), focusing on feasible OOS Sharpe ratios and SDF alphas not spanned by traditional factors.
How to use it
Construct AP-Tree cross-sections as test assets to evaluate factor models more sharply, or as investable basis portfolios (long-only, well-diversified, interpretable). Built from the same signals as conventional sorts, they retain interpretability while capturing conditional information those sorts discard.
Limitations & pitfalls
Results depend on tuning (depth, number of portfolios, shrinkage strength, node-size constraints).
As with any test-asset choice, conclusions about which factor models "pass" depend on the basis constructed.
Key references
Bryzgalova, S., Pelger, M. & Zhu, J. — Forest through the Trees — Journal of Finance (2023)
Kozak, S., Nagel, S. & Santosh, S. (2020) — Shrinking the Cross-Section — Journal of Financial Economics
Martin, I. & Nagel, S. (2019) — Market Efficiency in the Age of Big Data — working paper
Provenance: verified/generated from the paper's full text.