Skip to content

XGB 70 Statistical

XGBoost with comprehensive statistical features and mutual information feature selection.

Performance

Metric Value Rank
ROC AUC 0.6685 9th
F1 Score 0.4615 7th
Accuracy 0.7228 8th
Recall 0.40 7th
Train Time 189s Fast

Architecture

Input Features (70+) → Mutual Info Selection (top 50) → StandardScaler → XGBoost → Probability

Feature Selection

Uses Mutual Information to select the top 50 most informative features:

from sklearn.feature_selection import SelectKBest, mutual_info_classif

selector = SelectKBest(score_func=mutual_info_classif, k=50)
X_selected = selector.fit_transform(X, y)

Top Features by Mutual Information

  1. ks_statistic
  2. mean_diff_normalized
  3. std_ratio
  4. cohens_d
  5. mann_whitney_u

Hyperparameters

XGBClassifier(
    n_estimators=500,
    max_depth=8,
    learning_rate=0.05,
    subsample=0.8,
    colsample_bytree=0.8,
    min_child_weight=5,
    gamma=0.1,
    reg_alpha=0.1,
    reg_lambda=1.0,
    scale_pos_weight='auto',
    objective='binary:logistic',
    eval_metric='auc'
)

Key Differences from Tuned Regularization

Aspect xgb_70_statistical xgb_tuned_regularization
Feature Selection Mutual Info (50) None
max_depth 8 5
n_estimators 500 200
reg_alpha 0.1 0.5
reg_lambda 1.0 2.0

Usage

cd xgb_70_statistical
python main.py --mode train --data-dir /path/to/data --model-path ./model.joblib

When to Use

  • When you want explicit feature selection
  • For understanding which features matter most
  • As part of feature engineering research