2026-04-064 min read

How We Track Prediction Accuracy with Brier Scores

Every intelligence platform makes forecasts. We actually track whether ours come true — using the same calibration methods as professional forecasting tournaments.

The Accountability Gap

Most intelligence platforms make predictions constantly. "Oil will rise." "The Fed will cut rates." "Tensions will escalate." But almost none of them track whether those predictions came true.

This creates a perverse incentive: make bold predictions that attract attention, then quietly move on when they're wrong. The reader has no way to evaluate which sources are actually reliable.

How Brier Scoring Works

VORENTH uses Brier scores to measure prediction accuracy. The formula is simple:

Brier Score = (probability - outcome)²

Where outcome is 1 (correct), 0.5 (partial), or 0 (incorrect).

A perfect Brier score is 0.00 (you assigned 100% probability and it happened). The worst possible score is 1.00 (you assigned 100% probability and it didn't happen). Random guessing averages around 0.25.

What Makes Our System Different

1. Forecast vs. Signal Separation

VORENTH distinguishes between scored forecasts (discrete, falsifiable claims tracked against outcomes) and analytical signals (directional intelligence that informs but isn't scored). Only forecasts appear in the accuracy record — this prevents vague directional calls from polluting calibration data.

2. Automatic Extraction

Every intelligence report generates 1-3 high-quality forecasts and 1-4 analytical signals. Each forecast must be specific, time-bounded, and independently verifiable — no vague claims like "markets will react."

3. Market-Calibrated Probabilities

Forecast probabilities are anchored against live prediction market data (Polymarket). If the market says 60% and VORENTH says 75%, the system requires specific evidence to justify the deviation. This prevents overconfidence and probability clustering.

4. Category-Aware Confidence

The system adjusts confidence based on where news-based analysis has genuine predictive edge. Geopolitical and policy predictions get stronger conviction. Market price predictions are treated more conservatively — because quant models generally beat narrative analysis on specific price targets.

5. Evidence-Based Resolution

When a forecast reaches its target date, our system fetches real market data from Polygon and news from GDELT, then uses AI to adjudicate the outcome based solely on evidence — not general knowledge.

6. Calibration Feedback Loop

Historical accuracy data feeds back into the system. If we've been overconfident in a category (predicting 70% when outcomes occur 50% of the time), future probabilities are automatically adjusted downward using base rate anchoring.

7. Public Track Record

Our track record page shows aggregate forecast accuracy — calibration curves, Brier scores by category, and individual resolved forecasts. Full transparency.

Why This Matters

When you're making decisions based on intelligence analysis — whether it's portfolio allocation, policy recommendations, or risk assessment — you need to know how reliable the source is.

A system that tracks its own accuracy and self-corrects isn't just more honest. Over time, it becomes genuinely better at forecasting.

Ready to see it in action?

3 free intelligence queries per day. No credit card required.

Start analyzing