Neural Network Models¶

Neural networks learn hierarchical representations through layers of interconnected nodes.

Overview¶

Model	ROC AUC	F1 Score	Status
MLP Ensemble	0.7122	0.2105	Viable
Wavelet LSTM	0.5249	0.0000	Failed
Hierarchical Transformer	0.5439	0.0000	Failed

Limited Success

Only the MLP ensemble achieved meaningful performance. LSTM and Transformer models predicted all zeros.

Why Neural Networks Underperformed¶

The Core Problem¶

Neural networks are designed to learn relationships between multiple input variables. With only univariate time series and derived features, they lack the rich input space needed.

LSTM Limitations¶

Fail to memorize long sequential information
Cannot revise storage decisions dynamically
Cannot anticipate regime shifts without external data

Transformer Limitations¶

Require large datasets for meaningful attention patterns
Underperform in isolation for numerical prediction
Cross-entropy loss pushes toward majority class

What Would Help¶

To leverage these architectures effectively, add exogenous variables:

Type	Examples
Correlated series	Related assets, sector indices
Macroeconomic	Interest rates, VIX
Sentiment	News, social media
Technical	Volume, bid-ask spread

MLP Architecture¶

The MLP ensemble is the only successful neural network approach:

Features → [MLP Deep (4 layers) + MLP Wide (2 layers) + XGBoost] → Soft Voting → Probability

Key success factors:

Ensemble of architectures — Different MLP configurations
Includes XGBoost — Tree-based component adds diversity
Direct feature input — Doesn't try to learn from raw sequences

Key Findings¶

Neural Network Performance

mlp_ensemble_deep_features was the only neural approach that performed reasonably well (0.7122 AUC, 0.647 robust score).

Pure deep learning models (LSTM, Transformer) achieved near-random performance on univariate data.