Feature Documentation¶
Comprehensive documentation of all features used for structural break detection.
Feature Categories¶
-
Statistical Moments
Mean, variance, skewness, kurtosis differences between segments.
-
Effect Sizes
Cohen's d, Glass's delta, Hedges' g for standardized comparisons.
-
Hypothesis Tests
Welch's t-test, Kolmogorov-Smirnov, Mann-Whitney U statistics.
-
Distribution Metrics
Wasserstein distance, ECDF distance, overlap coefficient.
-
Spectral Features
FFT-based features: spectral centroid, bandwidth, entropy.
-
Wavelet Features
Multi-resolution decomposition using DWT coefficients.
Feature Engineering Methodology¶
Segment-Based Approach¶
Each time series is split at a potential break point into pre-segment and post-segment:
Features capture differences between these segments.
Most Discriminative Features¶
Based on XGBoost feature importance analysis:
| Rank | Feature | Category | Importance |
|---|---|---|---|
| 1 | ks_statistic |
Hypothesis Test | Highest |
| 2 | mean_diff_normalized |
Statistical Moment | Very High |
| 3 | std_ratio |
Statistical Moment | Very High |
| 4 | cohens_d |
Effect Size | High |
| 5 | mann_whitney_u |
Hypothesis Test | High |
| 6 | welch_t_stat |
Hypothesis Test | High |
| 7 | median_diff |
Statistical Moment | Medium |
Feature Count by Model¶
| Model | Feature Count | Performance |
|---|---|---|
| xgb_core_7features | 7 | 0.6188 AUC |
| xgb_30f_fast_inference | 30 | 0.6282 AUC |
| xgb_70_statistical | 50 (selected from 70) | 0.6685 AUC |
| xgb_tuned_regularization | 70+ | 0.7423 AUC |
| gradient_boost_comprehensive | 100+ | 0.7930 AUC |
| meta_stacking_7models | 200 (selected from 339) | 0.7662 AUC |
Feature-Performance Relationship
Performance generally improves with more features, but with diminishing returns after ~70 features.
Summary Table¶
| Category | Key Features | Best Used For |
|---|---|---|
| Statistical Moments | mean_diff, std_ratio | Level shifts, volatility changes |
| Effect Sizes | cohens_d, hedges_g | Scale-invariant detection |
| Hypothesis Tests | ks_statistic, welch_t_stat | Statistical significance |
| Distribution Metrics | wasserstein_distance | Full distributional change |
| Spectral Features | spectral_centroid | Frequency changes |
| Wavelet Features | dwt_energy | Multi-scale changes |