XGB Tuned Regularization¶

Note on Results

These results are based on local validation sets provided during the competition phase and do not represent final official leaderboard standings.

Achieved the highest robust score (0.715) among all 25 models in local validation with strong cross-dataset generalization.

Cross-Dataset Performance¶

Metric	Dataset A	Dataset B
ROC AUC	0.7423	0.7705
F1 Score	0.5172	0.5424

Generalization Metric	Value
Robust Score	0.715 (Rank #1 in local validation)
Stability Score	96.3%
Min AUC	0.7423

Generalization Analysis¶

This model showed consistent performance across both datasets:

Dataset B AUC (0.7705) was higher than Dataset A (0.7423)
Stability score of 96.3% indicates low variance between datasets
F1 score also consistent: 0.5172 (A) vs 0.5424 (B)

Architecture¶

Input Features (70+) → RobustScaler → XGBoost (Regularized) → Probability

Hyperparameters¶

XGBClassifier(
    n_estimators=1500,
    max_depth=5,              # Shallower trees
    learning_rate=0.02,       # Slow learning rate
    subsample=0.85,
    colsample_bytree=0.8,
    colsample_bylevel=0.8,
    min_child_weight=15,      # Higher minimum leaf weight
    gamma=0.05,
    reg_alpha=0.05,           # L1 regularization
    reg_lambda=0.8,           # L2 regularization
    scale_pos_weight='auto',
    objective='binary:logistic',
    eval_metric='auc'
)

Key Design Decisions¶

Regularization¶

reg_alpha=0.05  # L1 (promotes sparsity)
reg_lambda=0.8  # L2 (shrinks weights)

Regularization parameters help prevent overfitting and support generalization.

Shallower Trees¶

max_depth=5          # vs. typical 6-8
min_child_weight=15  # vs. typical 3-5

Shallower trees with larger minimum leaf weights create simpler models that generalize across datasets.

Many Trees with Slow Learning¶

learning_rate=0.02  # Slow learning rate
n_estimators=1500   # Many trees to compensate

Usage¶

cd xgb_tuned_regularization
python main.py --mode train --data-dir /path/to/data --model-path ./model.joblib
python main.py --mode infer --data-dir /path/to/data --model-path ./model.joblib

Comparison with Other Models¶

Model	Robust Score	Stability	Dataset A	Dataset B
xgb_tuned_regularization	0.715	96.3%	0.7423	0.7705
gradient_boost_comprehensive	0.538	82.4%	0.7930	0.6533
meta_stacking_7models	0.538	83.8%	0.7662	0.6422

Note: While gradient_boost achieved higher Dataset A AUC (0.7930), it showed lower stability (82.4%) and lower robust score (0.538) due to the drop on Dataset B.