Hypothesis Testing Pure¶
Ensemble of statistical tests with no training required.
Performance¶
| Metric | Value | Rank |
|---|---|---|
| ROC AUC | 0.5394 | 20th |
| F1 Score | 0.4167 | 11th |
| Accuracy | 0.4455 | 25th |
| Recall | 0.6667 | 1st |
| Train Time | 0s | Instant |
Highest Recall
This model has the highest recall (0.67), catching more breaks than any other model — but at the cost of many false positives.
Architecture¶
Five statistical tests combined with weighted voting:
Component Tests¶
1. Enhanced t-test (25% weight)¶
\[
t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}
\]
2. Kolmogorov-Smirnov Test (20% weight)¶
\[
D = \sup_x |F_1(x) - F_2(x)|
\]
3. CUSUM Test (15% weight)¶
\[
S_t = \sum_{i=1}^{t} \frac{x_i - \hat{\mu}}{\hat{\sigma}}
\]
4. Likelihood Ratio Test (20% weight)¶
\[
\Lambda = -2 \log\left(\frac{L_{\text{single}}}{L_{\text{two-segment}}}\right)
\]
Under null hypothesis, Λ ~ χ²(df).
5. Bayesian Model Comparison (20% weight)¶
\[
\log BF = -\frac{1}{2}(BIC_{\text{two}} - BIC_{\text{single}})
\]
\[
P(\text{break}|\text{data}) = \frac{1}{1 + \exp(-\log BF - \log(\text{prior odds}))}
\]
Final Score Calculation¶
score = (0.25 × t_score +
0.20 × ks_score +
0.15 × cusum_score +
0.20 × lr_score +
0.20 × bayes_score)
Advantages¶
- No training required — Deploy immediately
- Interpretable — Each component has statistical meaning
- Fast — Instant predictions
- Theoretically grounded — Based on established statistical theory
Limitations¶
- Lower accuracy — 0.5394 AUC vs 0.7930 for best ML model
- Many false positives — High recall (0.67) but low precision
- Assumes specific distributions — May not capture complex patterns
Usage¶
No --mode train needed — this model doesn't require training.
When to Use¶
Good For
- Baseline comparison
- When interpretability is critical
- No training data available
- Understanding statistical evidence
Avoid If
- Need high accuracy (use ML models)
- False positives are costly
- Complex non-standard patterns