FILE · Live model calibration

Live model calibration. Bad days public.

Every ML prediction we make settles after the horizon closes. Below is the actual hit rate, Brier score, and calibration gap per model — including the misses. Public, no login, refreshes on filter change.

WINDOW MODEL
Loading calibration data…

What we measure (and what we don't)

  • Hit rate = settled predictions where predicted_direction matched the realized direction over the prediction horizon (typically 5 trading days). Baseline = 50% (coin flip).
  • Brier score = mean squared error between confidence and outcome (0 perfect, 0.25 random). Lower is better.
  • Calibration gap = |avg_confidence − hit_rate|. Near zero = stated confidence matches reality. > 0.10 = overconfident.
  • Confidence bands = predictions bucketed by stated confidence. For each bucket, the bar shows actual hit rate vs band midpoint (dashed line = perfect calibration).

What this does NOT measure: slippage, fees, taxes, fill quality, position-sizing impact. Calibration shows whether the signal direction was right. Net P&L after execution is a separate question — see the backtester (5 bps slippage default) and /api/calibration raw data for reproducibility.

Settlement cadence: the accuracy tracker (src/modules/ml/accuracy_tracker.js) settles rows ~6 hours after the prediction horizon closes. Fresh predictions don't appear here until the horizon has elapsed plus that 6h buffer.