Benchmarking¶

Run comprehensive benchmarks to compare detector performance.

Quick Benchmark (Top 5 Detectors)¶

python quick_benchmark.py --data-dir /path/to/data

This runs only the top-performing detectors for faster evaluation.

Full Benchmark (All 25 Detectors)¶

python run_all_experiments.py --data-dir /path/to/data --output results.csv

Time Required

Full benchmark takes several hours. The meta_stacking_7models alone requires ~9 hours.

Output Format¶

Results are saved as CSV with the following columns:

Column	Description
`detector`	Model name
`train_time`	Training time in seconds
`eval_time`	Inference time in seconds
`status`	success/error
`TP`, `FP`, `TN`, `FN`	Confusion matrix
`accuracy`	Overall accuracy
`recall`	True positive rate
`f1_score`	Harmonic mean of precision/recall
`roc_auc`	Area under ROC curve

Understanding the Metrics¶

Primary Metric: ROC AUC¶

ROC AUC measures the model's ability to distinguish between classes across all thresholds. Higher is better.

0.5 = Random guessing
0.7+ = Good discrimination
0.8+ = Excellent discrimination

Secondary Metric: F1 Score¶

F1 balances precision and recall, important for imbalanced datasets like structural break detection.

Metric Selection

Use ROC AUC for comparing overall discriminative power
Use F1 Score when false positives are costly (trading applications)
Use Recall when missing breaks is unacceptable

Benchmark Results Summary¶

Tier	Models	ROC AUC Range	Notes
Top	xgb_tuned, weighted_ensemble, quad_ensemble	0.64-0.77	Best robust scores
Good	mlp_ensemble, xgb variants	0.58-0.68	Stable performance
Moderate	KNN models, segment_statistics	0.48-0.60	Variable stability
Poor	Deep learning, RL models	0.46-0.55	Near random performance

Custom Benchmarking¶

To benchmark a subset of models:

from run_all_experiments import run_experiments

detectors = [
    'xgb_tuned_regularization',
    'gradient_boost_comprehensive',
    'meta_stacking_7models'
]

results = run_experiments(
    data_dir='/path/to/data',
    detectors=detectors,
    output_path='custom_results.csv'
)