Skip to content

Reinforcement Learning Models

RL models learn decision policies through trial and error.

Overview

Model ROC AUC F1 Score Train Time
qlearning_bayesian_cpd 0.5540 0.0000 80,900s
Q-Learning Models 0.5488 0.0645 58s
DQN Model Selector 0.5474 0.4211 1,787s
qlearning_memory_tabular 0.4986 0.3175 53s

Near Random Performance

All RL models performed near random (0.50 AUC). These approaches failed to learn meaningful patterns from univariate features.

Why RL Failed

Core Problem: Weak State Space

RL agents learn optimal policies based on state-action-reward relationships. With only univariate features:

  1. State space lacks discriminative power — Features don't differentiate break/no-break well
  2. Reward signals are noisy — Without external context, feedback is inconsistent
  3. Q-values converge to uninformative policies — Agent learns to always predict one class

Q-Learning Limitations

\[ Q(s, a) \leftarrow Q(s, a) + \alpha[r + \gamma \max_{a'} Q(s', a') - Q(s, a)] \]

With weak states, the Q-values for different actions converge to similar values.

DQN Limitations

Even with function approximation via neural networks, the underlying state representation problem remains.

What Would Help

To make RL viable for structural break detection:

Improvement Why It Helps
Exogenous variables Richer state representation
Multi-step rewards Better credit assignment
Curriculum learning Start with easier cases
Hybrid approaches Combine with supervised learning

Key Findings

RL Models Underperformed

Tree-based models achieved higher robust scores than all RL approaches (0.715 vs <0.48).

Experimental Value

These models are included to demonstrate that:

  1. RL is not universally applicable
  2. State representation is critical for RL success
  3. Supervised learning achieved higher robust scores than RL for this task