DQN Model Selector¶

Deep Q-Network for dynamically weighting base model predictions.

Performance¶

Metric	Value	Rank
ROC AUC	0.5474	18th
F1 Score	0.4211	10th
Accuracy	0.5644	23rd
Recall	0.5333	5th
Train Time	1,787s	~30 min

Architecture¶

State (6) → FC(64) → ReLU → Dropout → FC(64) → ReLU → Dropout → FC(32) → ReLU → FC(11)

State Vector¶

s = [pred₁, pred₂, conf₁, conf₂, |pred₁-pred₂|, (pred₁+pred₂)/2]

where: - predᵢ: Base model i prediction (probability) - confᵢ = 1 - |predᵢ - 0.5| × 2: Confidence measure

Action Space¶

A = {0, 1, 2, ..., 10}  # 11 discrete actions

Action determines weight for first base model: - Action 0 → weight₁ = 0.0 (100% model 2) - Action 5 → weight₁ = 0.5 (50/50 blend) - Action 10 → weight₁ = 1.0 (100% model 1)

DQN Training¶

Experience Replay¶

memory = deque(maxlen=10000)
batch_size = 32

Stores (state, action, reward, next_state) tuples for stable training.

Target Network¶

# Update target network every 100 episodes
target_network ← q_network

Prevents oscillation by using a slowly-updating target.

Loss Function¶

\[ L = \text{MSE}(Q(s, a), r + \gamma \max_{a'} Q_{\text{target}}(s', a')) \]

Hyperparameters¶

learning_rate = 0.001
epsilon_decay = 0.995
epsilon_min = 0.01
gamma = 0.99
target_update_freq = 100

Why This Approach¶

The idea: instead of training a single model, train an RL agent to optimally combine predictions from multiple base models depending on the input.

Theoretical Advantage¶

Adaptive model selection based on input characteristics
Learns when each base model performs best

Practical Reality¶

State representation too weak for meaningful learning
Base model predictions don't provide enough signal
Converges to near-uniform weighting

Usage¶

cd dqn_base_model_selector
python main.py --mode train --data-dir /path/to/data --model-path ./model.pt

Interesting but Impractical

This model demonstrates an interesting meta-learning concept, but supervised ensembles (meta_stacking) achieve much better results with less complexity.