Meta Stacking 7 Models¶
Note on Results
These results are based on local validation sets provided during the competition phase and do not represent final official leaderboard standings.
7-model stacking ensemble that achieved high Dataset A performance in local validation but showed significant overfitting (83.8% stability).
Cross-Dataset Performance¶
| Metric | Dataset A | Dataset B |
|---|---|---|
| ROC AUC | 0.7662 | 0.6422 |
| F1 Score | 0.5417 | 0.3111 |
| Generalization Metric | Value |
|---|---|
| Robust Score | 0.538 (Rank #12) |
| Stability Score | 83.8% |
| Min AUC | 0.6422 |
| AUC Drop | -16.2% |
| Train Time | ~9 hours |
Generalization Analysis¶
This model showed significant performance variance between datasets:
- Dataset A AUC: 0.7662 (Rank #2)
- Dataset B AUC: 0.6422 (Rank #10)
- Dropped 8 ranks between datasets
- Stability of 83.8% indicates overfitting
- F1 also dropped significantly: 0.5417 (A) vs 0.3111 (B)
Architecture¶
flowchart TD
A["Input Features (339)"] --> B["Feature Selection → 200 features"]
B --> C1["MLP 1 Deep"]
B --> C2["MLP 2 Wide"]
B --> C3["Gradient Boosting"]
B --> C4["Random Forest"]
B --> C5["XGBoost"]
B --> C6["LightGBM"]
B --> C7["ExtraTrees"]
C1 --> D["Out-of-Fold Predictions"]
C2 --> D
C3 --> D
C4 --> D
C5 --> D
C6 --> D
C7 --> D
D --> E["Meta-Learner (LogReg)"]
E --> F["Final Probability"]
Base Models¶
Neural Networks¶
MLP 1 (Deep)
MLP 2 (Wide)
Tree-Based¶
XGBoost
LightGBM
Meta-Learner¶
Feature Set (339 Features)¶
- Statistical moments per segment
- Distribution comparisons (Wasserstein, KL divergence)
- Wavelet decomposition coefficients (db4, sym4, coif2)
- Spectral features from FFT
- Hypothesis test statistics
Stacking Process¶
- Split training data into K folds
- For each fold: Train base models on K-1 folds, predict on held-out fold
- Combine all out-of-fold predictions as meta-features
- Train meta-learner on meta-features
Usage¶
cd meta_stacking_7models
python main.py --mode train --data-dir /path/to/data --model-path ./model.joblib
Comparison with Other Models¶
| Model | Robust Score | Stability | Dataset A | Dataset B |
|---|---|---|---|---|
| xgb_tuned_regularization | 0.715 | 96.3% | 0.7423 | 0.7705 |
| weighted_dynamic_ensemble | 0.664 | 98.4% | 0.6742 | 0.6849 |
| meta_stacking_7models | 0.538 | 83.8% | 0.7662 | 0.6422 |
Note: Despite achieving the highest F1 score on Dataset A, the model's low stability (83.8%) resulted in a robust score of only 0.538 and a large performance drop on Dataset B.