Measuring Readability in Financial Disclosures

Source: Loughran, T. & McDonald, B. (2014). Journal of Finance 69(4), 1643–1671. DOI: 10.1111/jofi.12162

> Note: the full text on file is the 2010 working-paper version ("Measuring Readability in Financial Text"), whose centerpiece is the Plain English measure described below. The published 2014 Journal of Finance article revised the headline recommendation toward 10-K file size as the simplest robust complexity proxy; that file-size finding is from the published revision and is not the focus of the working-paper text grounded here.

TL;DR

Standard readability formulas imported from general linguistics — the Fog Index and Flesch Reading Ease, designed to grade-level K–12 textbooks — are measured with substantial error in financial text. Half of each formula's input is multisyllable-word counts, but business documents are full of long words ("corporation," "company," "telecommunications") that average investors understand, so Fog/Flesch mismeasure the real readability of 10-Ks. The paper proposes domain-specific alternatives validated against market behavior.

Problem it solves

Provides a defensible way to quantify the readability of corporate disclosures (10-Ks) for textual-analysis research, where the benchmark is that better-written filings are more informative to the market. The point is to reduce measurement error (and the resulting attenuation bias) that creeps in when grade-level formulas are applied off-domain.

The method

Three measures applied to a large 10-K sample:

Fog = a linear combination of average sentence length and the proportion of "complex words" (≥3 syllables), scaled to a grade level. Flesch uses the same two components but counts syllables explicitly and runs in the opposite direction (higher = easier).

Plain English — the paper's proposed measure: a standardized statistic anchored in the SEC's October 1998 plain-English rule, combining six components: (1) average sentence length, (2) average word length (chars/word), (3) passive-voice frequency, (4) legalese (a list of 12 phrases + 48 words from Staff Legal Bulletin No. 7), (5) personal pronouns ("we, us, our, you, your…"), and (6) an "other" bucket (negative compound phrases, superfluous phrases, "respectively"). Components are standardized and combined.

The published 2014 version additionally advocates the natural log of 10-K file size as an even simpler, validated proxy.

Assumptions & inputs

Sample: 10-K and 10-K405 filings from EDGAR over 1994–2007, starting from 113,196 documents and yielding a final matched sample of 42,357 firm-years after CRSP/Compustat screens (drops asset-backed/fund filings, amendments, low-price stocks).

Benchmark for validity: a readability measure should correlate with real market/behavioral outcomes, not just look like a grade level.

How to use it

Validation results in the working-paper text: Fog and Flesch show no change in 10-K readability over 1994–2007, whereas Plain English notably improves after the rule. In tests, Plain English is the measure that:

is positively related to increases in the proportion of 100-share ("small investor") trades post-regulation;

is the only measure significantly linked to seasoned equity issuance (firms write more readable 10-Ks before SEOs) in a logit model;

is higher for firms with shareholder-friendly governance.

For applied finance NLP, the operational takeaways are: do not rely on raw Fog/Flesch for business text; validate any text metric in-domain; and use Plain English (or, per the published paper, log file size) as a lower-error complexity proxy.

Limitations & pitfalls

Plain English is multi-component and parsing-dependent (passive-voice and legalese detection are heuristic); reproducibility hinges on the exact word/phrase lists.

File size (the published headline) conflates length with complexity and is sensitive to formatting/exhibits; as filings move to XBRL/HTML, byte-size proxies need re-validation.

All measures capture style/complexity, not factual informativeness directly; the market-impact benchmark is an indirect proxy for "readability."

Key references

Loughran, T. & McDonald, B. (2014) — Measuring Readability in Financial Disclosures — Journal of Finance

Loughran, T. & McDonald, B. (2011) — When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks — Journal of Finance

Li, F. (2008) — Annual Report Readability, Current Earnings, and Earnings Persistence — Journal of Accounting and Economics

Provenance: verified/generated from the paper's full text.