Skip to content

Feature Documentation

Comprehensive documentation of all features used for structural break detection.

Feature Categories

  • Statistical Moments


    Mean, variance, skewness, kurtosis differences between segments.

    Statistical Moments

  • Effect Sizes


    Cohen's d, Glass's delta, Hedges' g for standardized comparisons.

    Effect Sizes

  • Hypothesis Tests


    Welch's t-test, Kolmogorov-Smirnov, Mann-Whitney U statistics.

    Hypothesis Tests

  • Distribution Metrics


    Wasserstein distance, ECDF distance, overlap coefficient.

    Distribution Metrics

  • Spectral Features


    FFT-based features: spectral centroid, bandwidth, entropy.

    Spectral Features

  • Wavelet Features


    Multi-resolution decomposition using DWT coefficients.

    Wavelet Features

Feature Engineering Methodology

Segment-Based Approach

Each time series is split at a potential break point into pre-segment and post-segment:

Pre-segment   |  Post-segment
--------------+---------------
 values[0:T]  |  values[T:end]

Features capture differences between these segments.

Most Discriminative Features

Based on XGBoost feature importance analysis:

Rank Feature Category Importance
1 ks_statistic Hypothesis Test Highest
2 mean_diff_normalized Statistical Moment Very High
3 std_ratio Statistical Moment Very High
4 cohens_d Effect Size High
5 mann_whitney_u Hypothesis Test High
6 welch_t_stat Hypothesis Test High
7 median_diff Statistical Moment Medium

Feature Count by Model

Model Feature Count Performance
xgb_core_7features 7 0.6188 AUC
xgb_30f_fast_inference 30 0.6282 AUC
xgb_70_statistical 50 (selected from 70) 0.6685 AUC
xgb_tuned_regularization 70+ 0.7423 AUC
gradient_boost_comprehensive 100+ 0.7930 AUC
meta_stacking_7models 200 (selected from 339) 0.7662 AUC

Feature-Performance Relationship

Performance generally improves with more features, but with diminishing returns after ~70 features.

Summary Table

Category Key Features Best Used For
Statistical Moments mean_diff, std_ratio Level shifts, volatility changes
Effect Sizes cohens_d, hedges_g Scale-invariant detection
Hypothesis Tests ks_statistic, welch_t_stat Statistical significance
Distribution Metrics wasserstein_distance Full distributional change
Spectral Features spectral_centroid Frequency changes
Wavelet Features dwt_energy Multi-scale changes