Taming the Factor Zoo: A Test of New Factors

Source: Feng, G., Giglio, S. & Xiu, D. (2020). Journal of Finance 75(3), 1327–1370. (NBER WP #25481, 2019)

TL;DR

Provides a rigorous, model-selection-based test for whether a newly proposed factor adds explanatory power for the cross-section of returns beyond the hundreds of factors already proposed. It disciplines the "factor zoo": a new factor must earn its keep controlling for the existing zoo, and the method delivers valid inference despite the high-dimensional control set. Applied to recent factors, most are redundant — but a few (notably profitability) retain significant incremental explanatory power.

Problem it solves

The literature has produced hundreds of candidate factors. Testing whether a new factor g_t is useful requires controlling for a high-dimensional set of existing factors h_t. Standard practice (e.g., adding the FF3 controls, or running a single LASSO and treating its selection as correct) ignores model-selection error, producing omitted-variable bias and unstable, over-stated significance.

The method

Combines the double-selection LASSO of Belloni et al. (2014) with two-pass (Fama–MacBeth) cross-sectional regressions.

"Double selection": (1) LASSO-select controls from h_t that predict the test-asset average returns, and (2) LASSO-select controls from h_t that are correlated with the new factor g_t; take the union of selected controls.

Estimate g_t's SDF loading / risk premium in a post-selection two-pass regression on g_t plus the union of selected controls — this restores valid standard errors robust to selection mistakes.

The double selection in step (2) is what guards against dropping a control that is omitted from step (1) yet correlated with g_t.

Assumptions & inputs

Assumes the true pricing model is approximately low-dimensional, while h_t contains relevant, redundant, and "useless" factors.

Factor library: 150 tradable risk factors, monthly, July 1976 – December 2017.

Test assets: 750 portfolios (3×2 bivariate sorts; robustness with 1,825 5×5 sorts).

A conservative benchmark: the new factor is tested against the large pre-existing set.

How to use it / findings

Most newly proposed factors (post-2012) are redundant once benchmarked against the zoo; only a few add robust incremental power.

Profitability-type factors stand out as significant beyond the hundreds of prior factors.

Estimates and their significance are stable, whereas the factors chosen by a single (non-double) LASSO are not — demonstrating the value of double selection.

A recursive year-by-year exercise (1994–2016) tests each factor against the factors and test assets available up to its publication year.

Limitations & pitfalls

Conclusions depend on the assembled factor library and the test-asset set.

Linear pricing framework; interactions/nonlinearities are out of scope.

LASSO controls require the approximate-sparsity assumption; "useless" factors must have small loadings and low correlation with the relevant ones.

Key references

Feng, G., Giglio, S. & Xiu, D. (2020) — Taming the Factor Zoo — Journal of Finance

Belloni, A., Chernozhukov, V. & Hansen, C. (2014) — Inference on Treatment Effects after Selection among High-Dimensional Controls — Review of Economic Studies

Giglio, S. & Xiu, D. (2021) — Asset Pricing with Omitted Factors — Journal of Political Economy

Provenance: verified/generated from the paper's full text.