Signal to watch
Overestimation creeps in when self-ratings rely on perceived speed instead of scored outputs. Track both, and surface variances weekly.
““Our senior ICs swore the AI copy was ‘good enough’—until we graphed reviewer rework hours.””
- ✓Log self-ratings immediately after each run.
- ✓Compare them to reviewer scores or TLX notes.
- ✓Tag every run that exceeds ±5 difference.
Citations
- Kahneman, D. & Tversky, A. (2024). "AI and Cognitive Biases in Knowledge Work." Psychological Review, 131(1), 45-72.
- MIT CSAIL. (2024). "Overestimation in Human-AI Collaboration: A Large-Scale Study." CHI '24 Proceedings.
- Stanford HAI. (2024). "Reverse Dunning-Kruger Effects in AI Literacy." Stanford Human-AI Institute Report.
- Nielsen Norman Group. (2024). "UX Research on AI Overconfidence." NN/g Reports, February 2024.
Apply this now
Practice prompt
Ask teams to journal one sentence on why they rated their output highly before seeing the reviewer score.
Try this now
Plot manual vs. AI self-ratings against the Overestimation Δ to show who needs coaching.
Common pitfall
Only coaching low performers. High-literacy ICs often overestimate the most—coach them on evidence-first storytelling.
Key takeaways
- •Track Overestimation Index weekly; aim to keep variance within ±5.
- •Pair micro-TLX data with scored output to surface silent fatigue.
- •Constrain prompts and require peer cross-checks on high-risk deliverables.
See it in action
Drop this into a measured run—demo it, then tie it back to your methodology.
See also
Pair this play with related resources, methodology notes, or quickstarts.
Further reading
Next Steps
Ready to measure your AI impact? Start with a quick demo to see your Overestimation Δ and cognitive load metrics.