Skip to content

Gradient Boost Comprehensive

Note on Results

These results are based on local validation sets provided during the competition phase and do not represent final official leaderboard standings.

Achieved the highest Dataset A AUC (0.7930) in local validation but showed significant overfitting with low cross-dataset stability (82.4%).

Cross-Dataset Performance

Metric Dataset A Dataset B
ROC AUC 0.7930 0.6533
F1 Score 0.4186 0.3721
Generalization Metric Value
Robust Score 0.538 (Rank #13)
Stability Score 82.4%
Min AUC 0.6533
AUC Drop -17.6%

Generalization Analysis

This model showed significant performance variance between datasets:

  • Dataset A AUC: 0.7930 (Rank #1)
  • Dataset B AUC: 0.6533 (Rank #6)
  • Dropped 5 ranks between datasets
  • Stability of 82.4% indicates overfitting to Dataset A characteristics

Architecture

flowchart TD
    A["Input Features (100+)"] --> B1["Gradient Boosting 40%"]
    A --> B2["Random Forest 35%"]
    A --> B3["Logistic Regression 25%"]

    B1 --> C["Weighted Average"]
    B2 --> C
    B3 --> C

    C --> D["Final Prediction"]

Ensemble Weights

Model Weight Role
GradientBoostingClassifier 40% Primary predictor
RandomForestClassifier 35% Diversity, bagging
LogisticRegression 25% Linear baseline, calibration

Key Features

Comprehensive feature set including:

  • Segment statistics: Mean, std, skewness, kurtosis differences
  • Distribution tests: KS statistic, t-test, Mann-Whitney
  • Effect sizes: Cohen's d, Glass's delta
  • Stability measures: Variance ratio, IQR differences

Component Configurations

Gradient Boosting

GradientBoostingClassifier(
    n_estimators=200,
    learning_rate=0.1,
    max_depth=6,
    subsample=0.8,
    random_state=42
)

Random Forest

RandomForestClassifier(
    n_estimators=200,
    max_depth=10,
    min_samples_split=5,
    random_state=42
)

Logistic Regression (Calibrated)

CalibratedClassifierCV(
    LogisticRegression(C=0.1, max_iter=1000),
    cv=3
)

Usage

cd gradient_boost_comprehensive
python main.py --mode train --data-dir /path/to/data --model-path ./model.joblib

Comparison with Other Models

Model Robust Score Stability Dataset A Dataset B
xgb_tuned_regularization 0.715 96.3% 0.7423 0.7705
weighted_dynamic_ensemble 0.664 98.4% 0.6742 0.6849
gradient_boost_comprehensive 0.538 82.4% 0.7930 0.6533

Note: Despite having the highest Dataset A AUC, this model's low stability resulted in a robust score of only 0.538, ranking #13 out of 25 models.