Help center
Understanding the Overestimation Index
Δ shows when self-ratings and reviewer scores drift apart.
Formula
Δ = self-rating − scored performance
Capture self-ratings immediately after each run and subtract the reviewer score to reveal confidence gaps.
Thresholds
- OK (<5) · Healthy calibration.
- Watch (5–15) · Coach soon, gather more evidence.
- High (>15) · Freeze runs and reset expectations.
Δ quick reference
Track Δ per run so confidence stays aligned with reviewer scores.
PM self-rates 7, reviewer 6 → Δ = +1 (OK)
SWE self-rates 9, reviewer 5 → Δ = +4 (coach now)
How to improve
- Use timed, double-run tasks (manual vs. AI) to expose real lift.
- Log micro-TLX after every run to surface fatigue.
- Document prompt tweaks and reviewer time so comparisons stay honest.
Calibrate
Run a Fair Trial: counterbalance order, fix the timebox, and keep the rubric identical.
Learn more →Constrain
Use documented prompt scaffolds and note every tweak so variance stays observable.
Learn more →Cross-check
Compare reviewer verification time + defects against the AI Ethics and workload guides.
Learn more →Next Steps
Try the Interactive Demo
Experience a real evaluation with sample tasks
SWE Quickstart Guide
Role-specific guide for your first week
Or explore our methodology to understand the science behind the measurements
Open Beta
Help steer the Open Beta with real Δ and TLX tiles.
Run the analyzer demo, share methodology notes with your team, and send us benchmarks so the release ships with proof—not hype.
Next Steps
Ready to measure your AI impact? Start with a quick demo to see your Overestimation Δ and cognitive load metrics.