Statistical Models¶
Pure statistical approaches without machine learning.
Overview¶
| Model | ROC AUC | F1 Score | Train Time |
|---|---|---|---|
| Hypothesis Testing | 0.5394 | 0.4167 | 0s |
| Bayesian BOCPD | 0.5005 | 0.0625 | 183s |
| welch_ttest | 0.4634 | 0.0000 | 0s |
Key Advantage: No Training¶
Training-Free
These models don't require training data. They use statistical theory to detect breaks directly.
Use Cases¶
When to Use Statistical Models¶
- Baseline comparison — Benchmark for ML models
- Interpretability required — p-values and test statistics are understandable
- No training data — Deploy immediately without labeled examples
- Real-time streaming — Low latency, no model loading
When to Avoid¶
- Higher AUC needed — ML models achieved higher robust scores
- Complex patterns — Statistical tests assume specific distributions
Theory Behind the Tests¶
Hypothesis Testing Framework¶
\[
H_0: \text{No structural break (same distribution)}
$$
$$
H_1: \text{Structural break exists (different distributions)}
\]
Key Test Statistics¶
| Test | What It Measures |
|---|---|
| Welch's t-test | Mean difference significance |
| Kolmogorov-Smirnov | Maximum CDF difference |
| Mann-Whitney U | Rank-based comparison |
| Levene's test | Variance equality |
| CUSUM | Cumulative deviation from mean |
Key Findings¶
Statistical Models as Baseline
hypothesis_testing_pure serves as a baseline showing the gap between pure statistics and ML approaches.
Tree-based models achieved higher AUC (0.7423 vs 0.5394) and robust scores than statistical methods.