Forest through the Trees: Building Cross-Sections of Stock Returns

Source: Bryzgalova, S., Pelger, M. & Zhu, J. (working paper, this version May 2020; published Journal of Finance 2023)

TL;DR

Uses decision trees to build better test-asset portfolios for asset pricing. Instead of conventional one-, two-, or three-way characteristic sorts, "Asset-Pricing Trees (AP-Trees)" recursively split stocks on many characteristics jointly and conditionally, then prune the large set of resulting node portfolios down to a small, interpretable basis (e.g. 10–50 assets) that spans the SDF projected on those characteristics. The resulting cross-sections have, on average, ~30% higher Sharpe ratios and alphas than the leading reduced-form portfolios (and up to ~2x relative to decile sorts), all out-of-sample.

Problem it solves

The choice of test assets is first-order but neglected. Standard sorts capture only univariate or low-order interactions, miss conditional/nonlinear structure, and present "too low a hurdle" for candidate factor models. Going to many characteristics at once via full sorts is infeasible — the number of cells grows exponentially with depth (e.g. 10 characteristics at depth 4 explode combinatorially), and most cells are empty or noisy.

The method

Build AP-Trees: a sequence of conditional 50/50 splits on characteristics (e.g. size, then value, then size again) down to a chosen depth d; the nodes are managed (conditional) portfolios reflecting characteristic interactions. The collection of all trees forms a high-dimensional set of candidate basis portfolios.

Prune / regularize: generalize the robust SDF-recovery of Kozak, Nagel & Santosh (2020) using dual shrinkage in the mean and variance (an elastic-net-style mean-variance objective) to select a sparse, well-diversified, interpretable subset of node portfolios that maximizes the out-of-sample spanned Sharpe ratio.

Main results use up to 10 characteristics and trees up to depth 4, retaining small cross-sections of ~10–50 portfolios.

Assumptions & inputs

A panel of stock returns and a chosen set of firm characteristics (CRSP/Compustat-style data).

Tuning choices: number of desired portfolios, tree depth / degree of interactions, and minimum-size or market-cap restrictions on nodes.

Evaluation is framed out-of-sample, in the spirit of Martin & Nagel (2019), focusing on feasible OOS Sharpe ratios and SDF alphas not spanned by traditional factors.

How to use it

Construct AP-Tree cross-sections as test assets to evaluate factor models more sharply, or as investable basis portfolios (long-only, well-diversified, interpretable). Built from the same signals as conventional sorts, they retain interpretability while capturing conditional information those sorts discard.

Limitations & pitfalls

Results depend on tuning (depth, number of portfolios, shrinkage strength, node-size constraints).

As with any test-asset choice, conclusions about which factor models "pass" depend on the basis constructed.

Key references

Bryzgalova, S., Pelger, M. & Zhu, J. — Forest through the Trees — Journal of Finance (2023)

Kozak, S., Nagel, S. & Santosh, S. (2020) — Shrinking the Cross-Section — Journal of Financial Economics

Martin, I. & Nagel, S. (2019) — Market Efficiency in the Age of Big Data — working paper

Provenance: verified/generated from the paper's full text.