Reinforcement Learning Models¶
RL models learn decision policies through trial and error.
Overview¶
| Model | ROC AUC | F1 Score | Train Time |
|---|---|---|---|
| qlearning_bayesian_cpd | 0.5540 | 0.0000 | 80,900s |
| Q-Learning Models | 0.5488 | 0.0645 | 58s |
| DQN Model Selector | 0.5474 | 0.4211 | 1,787s |
| qlearning_memory_tabular | 0.4986 | 0.3175 | 53s |
Near Random Performance
All RL models performed near random (0.50 AUC). These approaches failed to learn meaningful patterns from univariate features.
Why RL Failed¶
Core Problem: Weak State Space¶
RL agents learn optimal policies based on state-action-reward relationships. With only univariate features:
- State space lacks discriminative power — Features don't differentiate break/no-break well
- Reward signals are noisy — Without external context, feedback is inconsistent
- Q-values converge to uninformative policies — Agent learns to always predict one class
Q-Learning Limitations¶
\[
Q(s, a) \leftarrow Q(s, a) + \alpha[r + \gamma \max_{a'} Q(s', a') - Q(s, a)]
\]
With weak states, the Q-values for different actions converge to similar values.
DQN Limitations¶
Even with function approximation via neural networks, the underlying state representation problem remains.
What Would Help¶
To make RL viable for structural break detection:
| Improvement | Why It Helps |
|---|---|
| Exogenous variables | Richer state representation |
| Multi-step rewards | Better credit assignment |
| Curriculum learning | Start with easier cases |
| Hybrid approaches | Combine with supervised learning |
Key Findings¶
RL Models Underperformed
Tree-based models achieved higher robust scores than all RL approaches (0.715 vs <0.48).
Experimental Value¶
These models are included to demonstrate that:
- RL is not universally applicable
- State representation is critical for RL success
- Supervised learning achieved higher robust scores than RL for this task