Skip to content

XGB Core 7 Features

Minimal XGBoost model using only 7 carefully selected features. Achieved high stability (98.0%) with fast training.

Cross-Dataset Performance

Metric Dataset A Dataset B
ROC AUC 0.6188 0.6315
F1 Score 0.4675 0.4571
Generalization Metric Value
Robust Score 0.606 (Rank #8)
Stability Score 98.0%
Min AUC 0.6188
Train Time 40s (Fastest)

Generalization Analysis

This model showed highly consistent performance across datasets:

  • Stability of 98.0% - one of the most stable models
  • Dataset A AUC (0.6188) vs Dataset B AUC (0.6315) - minimal variance
  • F1 scores also stable: 0.4675 (A) vs 0.4571 (B)

Architecture

7 Core Features → RobustScaler → XGBoost → Probability

The 7 Core Features

Feature Purpose
mean_diff Level shift magnitude
std_ratio Volatility change
cohens_d Standardized effect size
ks_statistic Distribution difference
welch_t_stat Mean significance
median_diff Robust location shift
iqr_ratio Robust spread change

These 7 features capture the essential aspects of structural breaks:

  • Location: mean_diff, median_diff, welch_t_stat
  • Scale: std_ratio, iqr_ratio
  • Effect: cohens_d
  • Distribution: ks_statistic

Hyperparameters

XGBClassifier(
    n_estimators=200,
    max_depth=6,
    learning_rate=0.05,
    subsample=0.8,
    colsample_bytree=0.8,
    min_child_weight=3,
    gamma=0.1,
    reg_alpha=0.1,
    reg_lambda=1.0
)

Usage

cd xgb_core_7features
python main.py --mode train --data-dir /path/to/data --model-path ./model.joblib

Comparison with Other Models

Model Features Robust Score Stability Dataset A Dataset B
xgb_core_7features 7 0.606 98.0% 0.6188 0.6315
xgb_tuned_regularization 70+ 0.715 96.3% 0.7423 0.7705
gradient_boost_comprehensive 100+ 0.538 82.4% 0.7930 0.6533

Note: While gradient_boost achieved the highest Dataset A AUC (0.7930), its low stability (82.4%) resulted in a lower robust score than this 7-feature model.