Skip to content

Hypothesis Tests

Statistical hypothesis tests provide formal frameworks for determining whether observed differences are likely due to chance.

Welch's t-test

Feature Names: welch_t_stat, welch_t_pval, welch_t_log_pval

\[ t = \frac{\mu_1 - \mu_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]

Degrees of Freedom (Welch-Satterthwaite)

\[ df = \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{(s_1^2/n_1)^2}{n_1-1} + \frac{(s_2^2/n_2)^2}{n_2-1}} \]

Properties

  • Does NOT assume equal variances
  • Robust for n > 30

Interpretation

  • Large |t|: Means are significantly different
  • log_pval = -log₁₀(p): Larger values = stronger evidence

Used by: All 25 detectors


Kolmogorov-Smirnov Test

Feature Names: ks_statistic, ks_pval, ks_log_pval

\[ D = \sup_x |F_1(x) - F_2(x)| \]

where F₁(x) and F₂(x) are empirical CDFs.

Properties

  • Non-parametric (distribution-free)
  • Sensitive to location, scale, AND shape differences
  • Most important feature in this benchmark

Interpretation

  • D ranges from 0 to 1
  • D = 0: Identical distributions
  • D = 1: No overlap

Most Discriminative Feature

ks_statistic was the most important feature across all XGBoost models in feature importance analysis.

Used by: All 25 detectors


Mann-Whitney U Test

Feature Names: mann_whitney_u, mann_whitney_pval

\[ U = n_1 n_2 + \frac{n_1(n_1+1)}{2} - R_1 \]

where R₁ is the sum of ranks of the first sample.

Properties

  • Non-parametric
  • Tests for stochastic dominance
  • Related to Cliff's delta: δ = (2U)/(n₁n₂) - 1

Interpretation

  • Under null: E[U] = n₁n₂/2
  • Large deviation indicates one group tends to be larger

Used by: xgb_70_statistical, hypothesis_testing_pure, meta_stacking_7models


Levene's Test

Feature Names: levene_stat, levene_pval

\[ W = \frac{(N-k)}{(k-1)} \times \frac{\sum_i n_i(\bar{Z}_i - \bar{Z})^2}{\sum_i \sum_j (Z_{ij} - \bar{Z}_i)^2} \]

where: $$ Z_{ij} = |x_{ij} - \tilde{x}_i| $$

Purpose

Tests equality of variances: - H₀: σ₁² = σ₂² - H₁: σ₁² ≠ σ₂²

Why Useful

Variance breaks (volatility regime changes) are common in financial data.

Used by: meta_stacking_7models, hypothesis_testing_pure