Skip to content

DQN Model Selector

Deep Q-Network for dynamically weighting base model predictions.

Performance

Metric Value Rank
ROC AUC 0.5474 18th
F1 Score 0.4211 10th
Accuracy 0.5644 23rd
Recall 0.5333 5th
Train Time 1,787s ~30 min

Architecture

State (6) → FC(64) → ReLU → Dropout → FC(64) → ReLU → Dropout → FC(32) → ReLU → FC(11)

State Vector

s = [pred, pred, conf, conf, |pred-pred|, (pred+pred)/2]

where: - predᵢ: Base model i prediction (probability) - confᵢ = 1 - |predᵢ - 0.5| × 2: Confidence measure

Action Space

A = {0, 1, 2, ..., 10}  # 11 discrete actions

Action determines weight for first base model: - Action 0 → weight₁ = 0.0 (100% model 2) - Action 5 → weight₁ = 0.5 (50/50 blend) - Action 10 → weight₁ = 1.0 (100% model 1)

DQN Training

Experience Replay

memory = deque(maxlen=10000)
batch_size = 32

Stores (state, action, reward, next_state) tuples for stable training.

Target Network

# Update target network every 100 episodes
target_network  q_network

Prevents oscillation by using a slowly-updating target.

Loss Function

\[ L = \text{MSE}(Q(s, a), r + \gamma \max_{a'} Q_{\text{target}}(s', a')) \]

Hyperparameters

learning_rate = 0.001
epsilon_decay = 0.995
epsilon_min = 0.01
gamma = 0.99
target_update_freq = 100

Why This Approach

The idea: instead of training a single model, train an RL agent to optimally combine predictions from multiple base models depending on the input.

Theoretical Advantage

  • Adaptive model selection based on input characteristics
  • Learns when each base model performs best

Practical Reality

  • State representation too weak for meaningful learning
  • Base model predictions don't provide enough signal
  • Converges to near-uniform weighting

Usage

cd dqn_base_model_selector
python main.py --mode train --data-dir /path/to/data --model-path ./model.pt

Interesting but Impractical

This model demonstrates an interesting meta-learning concept, but supervised ensembles (meta_stacking) achieve much better results with less complexity.