DQN Model Selector¶
Deep Q-Network for dynamically weighting base model predictions.
Performance¶
| Metric | Value | Rank |
|---|---|---|
| ROC AUC | 0.5474 | 18th |
| F1 Score | 0.4211 | 10th |
| Accuracy | 0.5644 | 23rd |
| Recall | 0.5333 | 5th |
| Train Time | 1,787s | ~30 min |
Architecture¶
State Vector¶
where:
- predᵢ: Base model i prediction (probability)
- confᵢ = 1 - |predᵢ - 0.5| × 2: Confidence measure
Action Space¶
Action determines weight for first base model: - Action 0 → weight₁ = 0.0 (100% model 2) - Action 5 → weight₁ = 0.5 (50/50 blend) - Action 10 → weight₁ = 1.0 (100% model 1)
DQN Training¶
Experience Replay¶
Stores (state, action, reward, next_state) tuples for stable training.
Target Network¶
Prevents oscillation by using a slowly-updating target.
Loss Function¶
Hyperparameters¶
learning_rate = 0.001
epsilon_decay = 0.995
epsilon_min = 0.01
gamma = 0.99
target_update_freq = 100
Why This Approach¶
The idea: instead of training a single model, train an RL agent to optimally combine predictions from multiple base models depending on the input.
Theoretical Advantage¶
- Adaptive model selection based on input characteristics
- Learns when each base model performs best
Practical Reality¶
- State representation too weak for meaningful learning
- Base model predictions don't provide enough signal
- Converges to near-uniform weighting
Usage¶
cd dqn_base_model_selector
python main.py --mode train --data-dir /path/to/data --model-path ./model.pt
Interesting but Impractical
This model demonstrates an interesting meta-learning concept, but supervised ensembles (meta_stacking) achieve much better results with less complexity.