ConvexPi

Taming the Factor Zoo: A Test of New Factors

Guanhao Feng, Stefano Giglio, Dacheng Xiu

The Journal of Finance · 2020 · 714 citations

Factor Zoo
Community wiki✎ Edit⟲ History

Taming the Factor Zoo: A Test of New Factors


Source: Feng, G., Giglio, S. & Xiu, D. (2020). Journal of Finance 75(3), 1327–1370. (NBER WP #25481, 2019)


TL;DR

Provides a rigorous, model-selection-based test for whether a newly proposed factor adds explanatory power for the cross-section of returns beyond the hundreds of factors already proposed. It disciplines the "factor zoo": a new factor must earn its keep controlling for the existing zoo, and the method delivers valid inference despite the high-dimensional control set. Applied to recent factors, most are redundant — but a few (notably profitability) retain significant incremental explanatory power.


Problem it solves

The literature has produced hundreds of candidate factors. Testing whether a new factor g_t is useful requires controlling for a high-dimensional set of existing factors h_t. Standard practice (e.g., adding the FF3 controls, or running a single LASSO and treating its selection as correct) ignores model-selection error, producing omitted-variable bias and unstable, over-stated significance.


The method

  • Combines the double-selection LASSO of Belloni et al. (2014) with two-pass (Fama–MacBeth) cross-sectional regressions.
  • "Double selection": (1) LASSO-select controls from h_t that predict the test-asset average returns, and (2) LASSO-select controls from h_t that are correlated with the new factor g_t; take the union of selected controls.
  • Estimate g_t's SDF loading / risk premium in a post-selection two-pass regression on g_t plus the union of selected controls — this restores valid standard errors robust to selection mistakes.
  • The double selection in step (2) is what guards against dropping a control that is omitted from step (1) yet correlated with g_t.

  • Assumptions & inputs

  • Assumes the true pricing model is approximately low-dimensional, while h_t contains relevant, redundant, and "useless" factors.
  • Factor library: 150 tradable risk factors, monthly, July 1976 – December 2017.
  • Test assets: 750 portfolios (3×2 bivariate sorts; robustness with 1,825 5×5 sorts).
  • A conservative benchmark: the new factor is tested against the large pre-existing set.

  • How to use it / findings

  • Most newly proposed factors (post-2012) are redundant once benchmarked against the zoo; only a few add robust incremental power.
  • Profitability-type factors stand out as significant beyond the hundreds of prior factors.
  • Estimates and their significance are stable, whereas the factors chosen by a single (non-double) LASSO are not — demonstrating the value of double selection.
  • A recursive year-by-year exercise (1994–2016) tests each factor against the factors and test assets available up to its publication year.

  • Limitations & pitfalls

  • Conclusions depend on the assembled factor library and the test-asset set.
  • Linear pricing framework; interactions/nonlinearities are out of scope.
  • LASSO controls require the approximate-sparsity assumption; "useless" factors must have small loadings and low correlation with the relevant ones.

  • Key references

  • Feng, G., Giglio, S. & Xiu, D. (2020) — Taming the Factor Zoo — Journal of Finance
  • Belloni, A., Chernozhukov, V. & Hansen, C. (2014) — Inference on Treatment Effects after Selection among High-Dimensional Controls — Review of Economic Studies
  • Giglio, S. & Xiu, D. (2021) — Asset Pricing with Omitted Factors — Journal of Political Economy



  • Provenance: verified/generated from the paper's full text.


    Community-maintained wiki — anyone can suggest an edit or view its revision history. Not peer-reviewed; verify claims against the original paper.

    Wiki last updated: June 24, 2026