Skip to content

Model Architectures

Note on Results

These results are based on local validation sets provided during the competition phase and do not represent final official leaderboard standings.

Comprehensive documentation of all 25 structural break detection models, evaluated on two independent datasets.

Key Finding

Single-dataset rankings are misleading. Models #1 and #2 on Dataset A dropped to #6 and #10 on Dataset B. Use Stability Score and Robust Score for model selection.

Model Families

This repository implements five distinct families of detection approaches:

  • Tree-Based Models


    XGBoost, Gradient Boosting variants. Best overall performance and stability.

    XGBoost Models

  • Neural Networks


    MLP ensembles, LSTM, Transformers. Mixed results on univariate data.

    Neural Networks

  • Ensemble Methods


    Stacking and voting ensembles. Some overfit, others stable.

    Ensembles

  • Reinforcement Learning


    Q-learning and DQN approaches. Near-random performance.

    RL Models

  • Statistical Models


    Pure hypothesis tests and Bayesian methods. Variable stability.

    Statistical

Performance by Family (Cross-Dataset)

Family Best Model Robust Score Stability Dataset A Dataset B
Tree-Based xgb_tuned_regularization 0.715 96.3% 0.7423 0.7705
Ensembles weighted_dynamic_ensemble 0.664 98.4% 0.6742 0.6849
Neural Networks mlp_ensemble_deep_features 0.647 95.3% 0.7122 0.6787
Statistical segment_statistics_only 0.569 95.4% 0.6249 0.5963
Reinforcement Learning qlearning_rolling_stats 0.470 92.5% 0.5488 0.5078

Overfitting Alert

Former Top Model Dataset A Rank Dataset B Rank Stability
gradient_boost_comprehensive #1 #6 82.4%
meta_stacking_7models #2 #10 83.8%

Common Architecture Pattern

All models follow a consistent pipeline:

Raw Time Series → Feature Extraction → Preprocessing → Model → Probability

Feature Extraction

Each time series is split at a potential break point into pre-segment and post-segment:

Pre-segment   |  Post-segment
--------------+---------------
 values[0:T]  |  values[T:end]

Features capture differences between these segments (see Feature Documentation).

Preprocessing

# Standard preprocessing pipeline
X = np.nan_to_num(X, nan=0, posinf=1e10, neginf=-1e10)
scaler = RobustScaler()  # or StandardScaler
X_scaled = scaler.fit_transform(X)

Probability Output

All models output a probability of structural break:

  • p ≈ 0: High confidence no break
  • p ≈ 0.5: Uncertain
  • p ≈ 1: High confidence break exists

Model Selection Guide

flowchart LR
    A[Task] --> B{Constraints?}
    B -->|Best Overall| C[xgb_tuned_regularization<br/>Robust: 0.715]
    B -->|Maximum Stability| D[xgb_selective_spectral<br/>99.7% stable]
    B -->|Speed Critical| E[xgb_core_7features<br/>40s, 98% stable]
    B -->|No Training| F[segment_statistics_only<br/>95.4% stable]

Avoid These Models

  • gradient_boost_comprehensive: 82.4% stability, overfits
  • meta_stacking_7models: 83.8% stability, overfits
  • hypothesis_testing_pure: 76.3% stability
  • Deep learning (LSTM, Transformer): Near-random on univariate data
  • RL models: Near-random performance