Model Architectures¶

Note on Results

These results are based on local validation sets provided during the competition phase and do not represent final official leaderboard standings.

Comprehensive documentation of all 25 structural break detection models, evaluated on two independent datasets.

Key Finding

Single-dataset rankings are misleading. Models #1 and #2 on Dataset A dropped to #6 and #10 on Dataset B. Use Stability Score and Robust Score for model selection.

Model Families¶

This repository implements five distinct families of detection approaches:

Tree-Based Models

XGBoost, Gradient Boosting variants. Best overall performance and stability.

XGBoost Models
Neural Networks

MLP ensembles, LSTM, Transformers. Mixed results on univariate data.

Neural Networks
Ensemble Methods

Stacking and voting ensembles. Some overfit, others stable.

Ensembles
Reinforcement Learning

Q-learning and DQN approaches. Near-random performance.

RL Models
Statistical Models

Pure hypothesis tests and Bayesian methods. Variable stability.

Statistical

Performance by Family (Cross-Dataset)¶

Family	Best Model	Robust Score	Stability	Dataset A	Dataset B
Tree-Based	xgb_tuned_regularization	0.715	96.3%	0.7423	0.7705
Ensembles	weighted_dynamic_ensemble	0.664	98.4%	0.6742	0.6849
Neural Networks	mlp_ensemble_deep_features	0.647	95.3%	0.7122	0.6787
Statistical	segment_statistics_only	0.569	95.4%	0.6249	0.5963
Reinforcement Learning	qlearning_rolling_stats	0.470	92.5%	0.5488	0.5078

Overfitting Alert¶

Former Top Model	Dataset A Rank	Dataset B Rank	Stability
gradient_boost_comprehensive	#1	#6	82.4%
meta_stacking_7models	#2	#10	83.8%

Common Architecture Pattern¶

All models follow a consistent pipeline:

Raw Time Series → Feature Extraction → Preprocessing → Model → Probability

Feature Extraction¶

Each time series is split at a potential break point into pre-segment and post-segment:

Pre-segment   |  Post-segment
--------------+---------------
 values[0:T]  |  values[T:end]

Features capture differences between these segments (see Feature Documentation).

Preprocessing¶

# Standard preprocessing pipeline
X = np.nan_to_num(X, nan=0, posinf=1e10, neginf=-1e10)
scaler = RobustScaler()  # or StandardScaler
X_scaled = scaler.fit_transform(X)

Probability Output¶

All models output a probability of structural break:

p ≈ 0: High confidence no break
p ≈ 0.5: Uncertain
p ≈ 1: High confidence break exists

Model Selection Guide¶

flowchart LR
    A[Task] --> B{Constraints?}
    B -->|Best Overall| C[xgb_tuned_regularization<br/>Robust: 0.715]
    B -->|Maximum Stability| D[xgb_selective_spectral<br/>99.7% stable]
    B -->|Speed Critical| E[xgb_core_7features<br/>40s, 98% stable]
    B -->|No Training| F[segment_statistics_only<br/>95.4% stable]

Avoid These Models

gradient_boost_comprehensive: 82.4% stability, overfits
meta_stacking_7models: 83.8% stability, overfits
hypothesis_testing_pure: 76.3% stability
Deep learning (LSTM, Transformer): Near-random on univariate data
RL models: Near-random performance