Skip to content

Ensemble Methods

Ensemble methods combine multiple models to improve predictive performance.

Overview

Model ROC AUC F1 Score Train Time
Gradient Boost Comprehensive 0.7930 0.4186 451s
Meta Stacking 7 Models 0.7662 0.5417 32,030s
Quad Model Ensemble 0.6756 0.3500 383s
mlp_xgb_simple_blend 0.6746 0.3636 109s
weighted_dynamic_ensemble 0.6742 0.3000 166s

Overfitting Alert

While gradient_boost_comprehensive (#1 ROC AUC on Dataset A) and meta_stacking_7models (#1 F1 on Dataset A) achieved high single-dataset scores, both showed significant overfitting with stability scores below 85%. See cross-dataset metrics before selecting.

Ensemble Strategies

Voting Ensembles

Simple combination of model predictions:

# Hard voting: majority wins
prediction = mode([model1.predict(), model2.predict(), model3.predict()])

# Soft voting: average probabilities
probability = mean([model1.predict_proba(), model2.predict_proba(), model3.predict_proba()])

Stacking Ensembles

Two-level architecture where base model predictions become features:

Level 0: Base models predict → Out-of-fold predictions
Level 1: Meta-learner learns optimal combination

Why Ensembles Work

Diversity is Key

Ensembles work best when base models have different inductive biases:

  • Trees vs. neural networks
  • Deep vs. wide architectures
  • Different regularization strategies

Error Reduction

If base models make independent errors, ensemble error decreases:

\[ \text{Ensemble Error} \approx \frac{\text{Individual Error}}{N} \]

where N is the number of models (idealized case).

Key Findings

Category Model Notes
Highest Dataset A AUC gradient_boost_comprehensive 0.7930, but overfits (82.4% stability)
Highest Stability weighted_dynamic_ensemble 98.4% stability, 0.664 robust
Best F1 meta_stacking_7models 0.5417, but overfits (83.8% stability)