Quick Start¶

Get up and running with structural break detection in minutes.

Important: Cross-Dataset Validation

Models were evaluated on two independent datasets. Some models that ranked high on Dataset A failed on Dataset B. Always use models with high Stability Score.

Training a Detector¶

Each detector follows a consistent interface:

cd <detector_folder>
python main.py --mode train --data-dir /path/to/data --model-path ./model.joblib

Example: Train the Top Performer¶

cd xgb_tuned_regularization
python main.py --mode train \
    --data-dir /path/to/your/data \
    --model-path ./model.joblib

Running Inference¶

python main.py --mode infer \
    --data-dir /path/to/data \
    --model-path ./model.joblib

Top Performers by Category¶

Best Robust ScoreMaximum StabilitySpeed CriticalNo Training Needed

xgb_tuned_regularization

cd xgb_tuned_regularization
python main.py --mode train --data-dir /data --model-path ./model.joblib

Metric	Value
Robust Score	0.715
Stability	96.3%
Dataset A AUC	0.7423
Dataset B AUC	0.7705
Training	60-185s

xgb_selective_spectral

cd xgb_selective_spectral
python main.py --mode train --data-dir /data --model-path ./model.joblib

Metric	Value
Robust Score	0.643
Stability	99.7%
Dataset A AUC	0.6451
Dataset B AUC	0.6471
Training	~78s

xgb_core_7features

cd xgb_core_7features
python main.py --mode train --data-dir /data --model-path ./model.joblib

Metric	Value
Robust Score	0.606
Stability	98.0%
Features	Only 7
Training	40s

segment_statistics_only

cd segment_statistics_only
python main.py --mode infer --data-dir /data

Metric	Value
Robust Score	0.569
Stability	95.4%
Training	~43s

Avoid hypothesis_testing_pure

Despite requiring no ML training, hypothesis_testing_pure has only 76.3% stability. Use segment_statistics_only instead.

Models to Avoid¶

Overfitting Models

These models performed well on Dataset A but failed to generalize:

Model	Dataset A Rank	Dataset B Rank	Stability
gradient_boost_comprehensive	#1	#6	82.4%
meta_stacking_7models	#2	#10	83.8%

Underperforming Models

These models perform near-random (0.50 AUC) on both datasets:

hierarchical_transformer (0.49-0.54 AUC)
wavelet_lstm (0.50-0.52 AUC)
All RL models (0.46-0.55 AUC)

Data Format¶

Your data should be organized with time series that can be split at potential break points. Each detector's features.py handles the feature extraction from raw time series.

Feature Extraction

All detectors automatically extract features from time series data. You don't need to precompute features.

Next Steps¶

Run full benchmarks to compare all models
Explore model architectures to understand how each works
Learn about features used for detection
View full results with cross-dataset analysis