Hypothesis Tests¶
Statistical hypothesis tests provide formal frameworks for determining whether observed differences are likely due to chance.
Welch's t-test¶
Feature Names: welch_t_stat, welch_t_pval, welch_t_log_pval
Degrees of Freedom (Welch-Satterthwaite)¶
Properties¶
- Does NOT assume equal variances
- Robust for n > 30
Interpretation¶
- Large |t|: Means are significantly different
log_pval = -log₁₀(p): Larger values = stronger evidence
Used by: All 25 detectors
Kolmogorov-Smirnov Test¶
Feature Names: ks_statistic, ks_pval, ks_log_pval
where F₁(x) and F₂(x) are empirical CDFs.
Properties¶
- Non-parametric (distribution-free)
- Sensitive to location, scale, AND shape differences
- Most important feature in this benchmark
Interpretation¶
- D ranges from 0 to 1
- D = 0: Identical distributions
- D = 1: No overlap
Most Discriminative Feature
ks_statistic was the most important feature across all XGBoost models in feature importance analysis.
Used by: All 25 detectors
Mann-Whitney U Test¶
Feature Names: mann_whitney_u, mann_whitney_pval
where R₁ is the sum of ranks of the first sample.
Properties¶
- Non-parametric
- Tests for stochastic dominance
- Related to Cliff's delta: δ = (2U)/(n₁n₂) - 1
Interpretation¶
- Under null: E[U] = n₁n₂/2
- Large deviation indicates one group tends to be larger
Used by: xgb_70_statistical, hypothesis_testing_pure, meta_stacking_7models
Levene's Test¶
Feature Names: levene_stat, levene_pval
where: $$ Z_{ij} = |x_{ij} - \tilde{x}_i| $$
Purpose¶
Tests equality of variances: - H₀: σ₁² = σ₂² - H₁: σ₁² ≠ σ₂²
Why Useful¶
Variance breaks (volatility regime changes) are common in financial data.
Used by: meta_stacking_7models, hypothesis_testing_pure