XGBoost Models¶

XGBoost (eXtreme Gradient Boosting) models are the strongest performers in this benchmark.

Overview¶

XGBoost builds an ensemble of decision trees sequentially, where each new tree corrects errors made by previous trees.

Objective Function¶

\[ L(\theta) = \sum_i l(y_i, \hat{y}_i) + \sum_k \Omega(f_k) \]

where:

\(l(y_i, \hat{y}_i)\): Loss function (log loss for binary classification)
\(\Omega(f_k) = \gamma T + \frac{1}{2}\lambda\|w\|^2\): Regularization term
\(T\): Number of leaves in tree
\(w\): Leaf weights

XGBoost Variants¶

Model	ROC AUC	Features	Train Time	Use Case
xgb_tuned_regularization	0.7423	70+	185s	Best overall
xgb_70_statistical	0.6685	70→50	189s	Balanced
xgb_core_7features	0.6188	7	40s	Speed critical
xgb_importance_top15	0.6723	15	178s	Feature selection
xgb_selective_spectral	0.6451	Spectral	78s	Frequency focus
xgb_30f_fast_inference	0.6282	30	142s	Fast inference
kolmogorov_smirnov_xgb	0.4939	KS-based	69s	Low performance

Why XGBoost Works Well¶

Strengths

Excellent with engineered features — Directly uses statistical features without needing to learn representations
Built-in regularization — L1/L2 regularization prevents overfitting
Handles missing values — Native support for NaN
Fast training — Histogram-based splitting

Considerations

Requires feature engineering (doesn't learn features from raw data)
Can overfit with too many trees/depth
Less interpretable than linear models

Common Hyperparameters¶

XGBClassifier(
    n_estimators=200,        # Number of trees
    max_depth=6,             # Tree depth
    learning_rate=0.05,      # Step size shrinkage
    subsample=0.8,           # Row subsampling
    colsample_bytree=0.8,    # Column subsampling
    reg_alpha=0.1,           # L1 regularization
    reg_lambda=1.0,          # L2 regularization
    scale_pos_weight=auto,   # Class imbalance handling
)

Key Findings¶

xgb_tuned_regularization achieved the best results:

Best robust score (0.715) with 96.3% stability
Highest F1 score among XGBoost variants (0.5172)
Heavy regularization prevented overfitting