Skip to content

Benchmarking

Run comprehensive benchmarks to compare detector performance.

Quick Benchmark (Top 5 Detectors)

python quick_benchmark.py --data-dir /path/to/data

This runs only the top-performing detectors for faster evaluation.

Full Benchmark (All 25 Detectors)

python run_all_experiments.py --data-dir /path/to/data --output results.csv

Time Required

Full benchmark takes several hours. The meta_stacking_7models alone requires ~9 hours.

Output Format

Results are saved as CSV with the following columns:

Column Description
detector Model name
train_time Training time in seconds
eval_time Inference time in seconds
status success/error
TP, FP, TN, FN Confusion matrix
accuracy Overall accuracy
recall True positive rate
f1_score Harmonic mean of precision/recall
roc_auc Area under ROC curve

Understanding the Metrics

Primary Metric: ROC AUC

ROC AUC measures the model's ability to distinguish between classes across all thresholds. Higher is better.

0.5 = Random guessing
0.7+ = Good discrimination
0.8+ = Excellent discrimination

Secondary Metric: F1 Score

F1 balances precision and recall, important for imbalanced datasets like structural break detection.

Metric Selection

  • Use ROC AUC for comparing overall discriminative power
  • Use F1 Score when false positives are costly (trading applications)
  • Use Recall when missing breaks is unacceptable

Benchmark Results Summary

Tier Models ROC AUC Range Notes
Top xgb_tuned, weighted_ensemble, quad_ensemble 0.64-0.77 Best robust scores
Good mlp_ensemble, xgb variants 0.58-0.68 Stable performance
Moderate KNN models, segment_statistics 0.48-0.60 Variable stability
Poor Deep learning, RL models 0.46-0.55 Near random performance

Custom Benchmarking

To benchmark a subset of models:

from run_all_experiments import run_experiments

detectors = [
    'xgb_tuned_regularization',
    'gradient_boost_comprehensive',
    'meta_stacking_7models'
]

results = run_experiments(
    data_dir='/path/to/data',
    detectors=detectors,
    output_path='custom_results.csv'
)