XGB 70 Statistical¶
XGBoost with comprehensive statistical features and mutual information feature selection.
Performance¶
| Metric | Value | Rank |
|---|---|---|
| ROC AUC | 0.6685 | 9th |
| F1 Score | 0.4615 | 7th |
| Accuracy | 0.7228 | 8th |
| Recall | 0.40 | 7th |
| Train Time | 189s | Fast |
Architecture¶
Feature Selection¶
Uses Mutual Information to select the top 50 most informative features:
from sklearn.feature_selection import SelectKBest, mutual_info_classif
selector = SelectKBest(score_func=mutual_info_classif, k=50)
X_selected = selector.fit_transform(X, y)
Top Features by Mutual Information¶
ks_statisticmean_diff_normalizedstd_ratiocohens_dmann_whitney_u
Hyperparameters¶
XGBClassifier(
n_estimators=500,
max_depth=8,
learning_rate=0.05,
subsample=0.8,
colsample_bytree=0.8,
min_child_weight=5,
gamma=0.1,
reg_alpha=0.1,
reg_lambda=1.0,
scale_pos_weight='auto',
objective='binary:logistic',
eval_metric='auc'
)
Key Differences from Tuned Regularization¶
| Aspect | xgb_70_statistical | xgb_tuned_regularization |
|---|---|---|
| Feature Selection | Mutual Info (50) | None |
| max_depth | 8 | 5 |
| n_estimators | 500 | 200 |
| reg_alpha | 0.1 | 0.5 |
| reg_lambda | 1.0 | 2.0 |
Usage¶
cd xgb_70_statistical
python main.py --mode train --data-dir /path/to/data --model-path ./model.joblib
When to Use¶
- When you want explicit feature selection
- For understanding which features matter most
- As part of feature engineering research