ConvexPi

Editor's Choice … and the Cross-Section of Expected Returns

Campbell R. Harvey, Yan Liu, Caroline Zhu

Review of Financial Studies · 2016 · 248 citations

Factor Zoo
Community wiki✎ Edit⟲ History

…and the Cross-Section of Expected Returns


Source: Harvey, Liu & Zhu (2016) · Review of Financial Studies · DOI: 10.1093/rfs/hhv059


TL;DR


With 300+ published factors, the conventional significance hurdle (t-statistic > 2.0) is far too lenient — it guarantees a flood of false discoveries. Accounting for multiple testing, the authors argue a newly proposed factor should clear a t-statistic of roughly 3.0 (about a 0.5% significance level), and that bar should keep rising as more factors are tested. Most of the "factor zoo" would not survive.


The problem it addresses


Empirical asset pricing has a multiple-testing crisis. Decades of researchers have tested thousands of candidate predictors and published the ones that cleared t > 2. But when you run enough independent tests, some will exceed t = 2 purely by chance. The single-test threshold ignores the vast number of factors tried (including the unpublished, unreported ones), so the published cross-section is riddled with false positives — the "factor zoo."


Main findings


  • The 2.0 hurdle is obsolete. Under standard multiple-testing corrections, a t-stat of 2.0 corresponds to an unacceptably high false-discovery rate given the number of factors tested.
  • New threshold ≈ 3.0. Applying multiple-testing adjustments, a newly discovered factor needs a t-statistic around 3.0 to be credible — and the required hurdle increases over time as the count of tested factors grows (they estimate it should already be ~3.0+ and rising past ~3.4 in later years).
  • Most published factors fail. A large fraction of the 300+ documented factors would not clear the corrected bar; many are likely spurious.
  • The true test count is understated. Because failed tests go unpublished (the file-drawer problem), the real number of trials — and thus the proper hurdle — is even higher than the published record implies.

  • Methodology


  • Compile a history of 300+ published factors with their reported t-statistics and publication dates.
  • Apply three multiple-testing frameworks: Bonferroni and Holm (control the family-wise error rate) and Benjamini-Hochberg-Yekutieli (control the false-discovery rate).
  • Translate each into a time-varying t-statistic hurdle as a function of the cumulative number of factors tested, adjusting for the unobserved file-drawer of unpublished tests.
  • Re-evaluate the published factors against the corrected thresholds.

  • Implications for factor investing


  • Demand t ≈ 3.0+, not 2.0, for any newly claimed factor — and treat marginal (t between 2 and 3) "discoveries" as probably noise.
  • Out-of-sample validation is non-negotiable. Multiple-testing math says in-sample significance is cheap; only genuine out-of-sample performance (on data not used to find the signal) is persuasive — the principle ConvexPi's hidden evaluation period operationalizes.
  • Account for your own search. If you scan many signals, your personal hurdle must rise accordingly; report how many you tried, not just the winner.
  • Be skeptical of the zoo. Pair this with McLean-Pontiff (2016): even "real" factors decay post-publication, and many published ones were never real to begin with.

  • Key references


  • Harvey, C., Liu, Y. & Zhu, H. (2016) — …and the Cross-Section of Expected Returns — Review of Financial Studies — DOI: 10.1093/rfs/hhv059
  • Harvey, C. & Liu, Y. (2020) — False (and Missed) Discoveries in Financial Economics — Journal of Finance
  • McLean, R. D. & Pontiff, J. (2016) — Does Academic Research Destroy Stock Return Predictability? — Journal of Finance
  • Hou, K., Xue, C. & Zhang, L. (2020) — Replicating Anomalies — Review of Financial Studies
  • Chen, A. & Zimmermann, T. (2022) — Open Source Cross-Sectional Asset Pricing — Critical Finance Review

  • Community-maintained wiki — anyone can suggest an edit or view its revision history. Not peer-reviewed; verify claims against the original paper.

    Wiki last updated: June 19, 2026