Model Architectures¶
Note on Results
These results are based on local validation sets provided during the competition phase and do not represent final official leaderboard standings.
Comprehensive documentation of all 25 structural break detection models, evaluated on two independent datasets.
Key Finding
Single-dataset rankings are misleading. Models #1 and #2 on Dataset A dropped to #6 and #10 on Dataset B. Use Stability Score and Robust Score for model selection.
Model Families¶
This repository implements five distinct families of detection approaches:
-
Tree-Based Models
XGBoost, Gradient Boosting variants. Best overall performance and stability.
-
Neural Networks
MLP ensembles, LSTM, Transformers. Mixed results on univariate data.
-
Ensemble Methods
Stacking and voting ensembles. Some overfit, others stable.
-
Reinforcement Learning
Q-learning and DQN approaches. Near-random performance.
-
Statistical Models
Pure hypothesis tests and Bayesian methods. Variable stability.
Performance by Family (Cross-Dataset)¶
| Family | Best Model | Robust Score | Stability | Dataset A | Dataset B |
|---|---|---|---|---|---|
| Tree-Based | xgb_tuned_regularization | 0.715 | 96.3% | 0.7423 | 0.7705 |
| Ensembles | weighted_dynamic_ensemble | 0.664 | 98.4% | 0.6742 | 0.6849 |
| Neural Networks | mlp_ensemble_deep_features | 0.647 | 95.3% | 0.7122 | 0.6787 |
| Statistical | segment_statistics_only | 0.569 | 95.4% | 0.6249 | 0.5963 |
| Reinforcement Learning | qlearning_rolling_stats | 0.470 | 92.5% | 0.5488 | 0.5078 |
Overfitting Alert¶
| Former Top Model | Dataset A Rank | Dataset B Rank | Stability |
|---|---|---|---|
| gradient_boost_comprehensive | #1 | #6 | 82.4% |
| meta_stacking_7models | #2 | #10 | 83.8% |
Common Architecture Pattern¶
All models follow a consistent pipeline:
Feature Extraction¶
Each time series is split at a potential break point into pre-segment and post-segment:
Features capture differences between these segments (see Feature Documentation).
Preprocessing¶
# Standard preprocessing pipeline
X = np.nan_to_num(X, nan=0, posinf=1e10, neginf=-1e10)
scaler = RobustScaler() # or StandardScaler
X_scaled = scaler.fit_transform(X)
Probability Output¶
All models output a probability of structural break:
p ≈ 0: High confidence no breakp ≈ 0.5: Uncertainp ≈ 1: High confidence break exists
Model Selection Guide¶
flowchart LR
A[Task] --> B{Constraints?}
B -->|Best Overall| C[xgb_tuned_regularization<br/>Robust: 0.715]
B -->|Maximum Stability| D[xgb_selective_spectral<br/>99.7% stable]
B -->|Speed Critical| E[xgb_core_7features<br/>40s, 98% stable]
B -->|No Training| F[segment_statistics_only<br/>95.4% stable]
Avoid These Models
gradient_boost_comprehensive: 82.4% stability, overfitsmeta_stacking_7models: 83.8% stability, overfitshypothesis_testing_pure: 76.3% stability- Deep learning (LSTM, Transformer): Near-random on univariate data
- RL models: Near-random performance