XGBoost Models¶
XGBoost (eXtreme Gradient Boosting) models are the strongest performers in this benchmark.
Overview¶
XGBoost builds an ensemble of decision trees sequentially, where each new tree corrects errors made by previous trees.
Objective Function¶
\[
L(\theta) = \sum_i l(y_i, \hat{y}_i) + \sum_k \Omega(f_k)
\]
where:
- \(l(y_i, \hat{y}_i)\): Loss function (log loss for binary classification)
- \(\Omega(f_k) = \gamma T + \frac{1}{2}\lambda\|w\|^2\): Regularization term
- \(T\): Number of leaves in tree
- \(w\): Leaf weights
XGBoost Variants¶
| Model | ROC AUC | Features | Train Time | Use Case |
|---|---|---|---|---|
| xgb_tuned_regularization | 0.7423 | 70+ | 185s | Best overall |
| xgb_70_statistical | 0.6685 | 70→50 | 189s | Balanced |
| xgb_core_7features | 0.6188 | 7 | 40s | Speed critical |
| xgb_importance_top15 | 0.6723 | 15 | 178s | Feature selection |
| xgb_selective_spectral | 0.6451 | Spectral | 78s | Frequency focus |
| xgb_30f_fast_inference | 0.6282 | 30 | 142s | Fast inference |
| kolmogorov_smirnov_xgb | 0.4939 | KS-based | 69s | Low performance |
Why XGBoost Works Well¶
Strengths
- Excellent with engineered features — Directly uses statistical features without needing to learn representations
- Built-in regularization — L1/L2 regularization prevents overfitting
- Handles missing values — Native support for NaN
- Fast training — Histogram-based splitting
Considerations
- Requires feature engineering (doesn't learn features from raw data)
- Can overfit with too many trees/depth
- Less interpretable than linear models
Common Hyperparameters¶
XGBClassifier(
n_estimators=200, # Number of trees
max_depth=6, # Tree depth
learning_rate=0.05, # Step size shrinkage
subsample=0.8, # Row subsampling
colsample_bytree=0.8, # Column subsampling
reg_alpha=0.1, # L1 regularization
reg_lambda=1.0, # L2 regularization
scale_pos_weight=auto, # Class imbalance handling
)
Key Findings¶
xgb_tuned_regularization achieved the best results:
- Best robust score (0.715) with 96.3% stability
- Highest F1 score among XGBoost variants (0.5172)
- Heavy regularization prevented overfitting