Deep Learning with Long Short-Term Memory Networks for Financial Market Predictions

Source: Fischer, T. & Krauss, C. (2018) · European Journal of Operational Research 270(2), 654–669

TL;DR

Applies LSTM recurrent neural networks to predict next-day directional movements of S&P 500 constituent stocks (1992–2015, survivor-bias-free) and shows they outperform memory-free baselines — random forest, deep MLP, and logistic regression. A daily long-short portfolio earns 0.46% per day (annualized Sharpe ≈ 5.8) before costs and 0.26%/day after 5 bps/half-turn transaction costs. Crucially, profitability decays across three regimes, with the edge largely eroded after 2010 as the methods diffused and markets adapted.

Problem it solves

Financial time series are noisy and (semi-strong) efficient, yet 100+ documented anomalies show predictable structure. The standard return-predictive models are linear/transparent and cannot capture complex non-linear, temporal dependencies. The paper asks whether a sequence-learning architecture (LSTM) can extract such structure on a large, liquid, survivor-bias-free universe — and how the resulting edge behaves over time.

The method

Universe & data: all S&P 500 constituents, Dec 1989 – Oct 2015, survivor-bias-free.

Study periods (rolling, non-overlapping trading): following Krauss et al. (2017), each is a 750-day training period (~3 years) + 250-day trading period (~1 year); ~380,000 input sequences per study period, ~255,000 used for in-sample training.

Feature: a single feature — the standardized one-day return (demeaned and scaled within the study period) — fed as overlapping sequences of length 240 (~one trading year of consecutive daily returns).

Target: binary — will the stock out- or under-perform the cross-sectional median return over the next day.

Model: an LSTM (one hidden layer of 25 memory cells in the base spec) built in Keras, output a probability of outperformance.

Trading rule: each day rank stocks by predicted probability; go long the top k and short the bottom k, equal-weighted (analysis focuses on k = 10).

Assumptions & inputs

Costs of 5 bps per half-turn (Avellaneda & Lee 2010) applied to compute post-cost performance.

Benchmarks on the identical sample: random forest (RAF, the strongest baseline), standard deep neural net (DNN), and logistic regression (LOG).

Predictions assume the standardized-return sequence carries exploitable, partly non-linear temporal signal.

How to use it

Headline (k=10, before costs): LSTM 0.46%/day vs RAF 0.43%, DNN 0.32%, LOG 0.26%. After costs: LSTM 0.26%/day, RAF 0.23%, DNN 0.12% (still significant), LOG 0.06%.

Interpretability: the stocks selected for trading share a common profile — high volatility and a short-term reversal return pattern; the authors formalize a rules-based short-term-reversal strategy that explains a portion of the LSTM's returns.

Regime decomposition of post-cost performance: (1) 1990s — exceptional cumulative payouts (>$11/USD-per-day for LSTM by 2000), plausibly because LSTMs/RAFs were unknown or infeasible (LSTM introduced 1997; GPU feasibility only late 2000s); (2) 2001–2009 — "moderation," positive but much lower returns as strategies diffused (RAF spikes during the 2008–09 crisis); (3) 2010–2015 — "deterioration," RAF loses its edge and destroys value, while the LSTM roughly preserves capital after costs.

Limitations & pitfalls

Pre-cost results overstate tradability; the long-short rule has high turnover and the post-2010 edge is thin to negative after realistic costs.

Performance is regime-dependent and decays as the method becomes common — a caution against extrapolating early-sample backtest returns.

Single-feature design (standardized returns only) and binary cross-sectional target; results sensitive to study-period construction, sequence length, and look-ahead in feature standardization.

An academic backtest cannot cleanly separate true inefficiency from limits-to-arbitrage effects (short-sale costs, liquidity).

Key references

Krauss, C., Do, X. A. & Huck, N. (2017) — Deep Neural Networks, Gradient-Boosted Trees, Random Forests: Statistical Arbitrage on the S&P 500 — European Journal of Operational Research

Hochreiter, S. & Schmidhuber, J. (1997) — Long Short-Term Memory — Neural Computation

Gu, S., Kelly, B. & Xiu, D. (2020) — Empirical Asset Pricing via Machine Learning — Review of Financial Studies

Avellaneda, M. & Lee, J.-H. (2010) — Statistical Arbitrage in the US Equities Market — Quantitative Finance

Provenance: verified/generated from the paper's full text.