Ensemble Methods¶
Ensemble methods combine multiple models to improve predictive performance.
Overview¶
| Model | ROC AUC | F1 Score | Train Time |
|---|---|---|---|
| Gradient Boost Comprehensive | 0.7930 | 0.4186 | 451s |
| Meta Stacking 7 Models | 0.7662 | 0.5417 | 32,030s |
| Quad Model Ensemble | 0.6756 | 0.3500 | 383s |
| mlp_xgb_simple_blend | 0.6746 | 0.3636 | 109s |
| weighted_dynamic_ensemble | 0.6742 | 0.3000 | 166s |
Overfitting Alert
While gradient_boost_comprehensive (#1 ROC AUC on Dataset A) and meta_stacking_7models (#1 F1 on Dataset A) achieved high single-dataset scores, both showed significant overfitting with stability scores below 85%. See cross-dataset metrics before selecting.
Ensemble Strategies¶
Voting Ensembles¶
Simple combination of model predictions:
# Hard voting: majority wins
prediction = mode([model1.predict(), model2.predict(), model3.predict()])
# Soft voting: average probabilities
probability = mean([model1.predict_proba(), model2.predict_proba(), model3.predict_proba()])
Stacking Ensembles¶
Two-level architecture where base model predictions become features:
Level 0: Base models predict → Out-of-fold predictions
Level 1: Meta-learner learns optimal combination
Why Ensembles Work¶
Diversity is Key
Ensembles work best when base models have different inductive biases:
- Trees vs. neural networks
- Deep vs. wide architectures
- Different regularization strategies
Error Reduction¶
If base models make independent errors, ensemble error decreases:
\[
\text{Ensemble Error} \approx \frac{\text{Individual Error}}{N}
\]
where N is the number of models (idealized case).
Key Findings¶
| Category | Model | Notes |
|---|---|---|
| Highest Dataset A AUC | gradient_boost_comprehensive | 0.7930, but overfits (82.4% stability) |
| Highest Stability | weighted_dynamic_ensemble | 98.4% stability, 0.664 robust |
| Best F1 | meta_stacking_7models | 0.5417, but overfits (83.8% stability) |