Overestimation Δ snapshot
Δ = self-rating − reviewer score. Keep cohorts within ±5 to prove confidence ≈ accuracy.
Validity
What the scores mean—and what they do not.
What this means: You can trust the numbers—they're scientifically validated, statistically reliable, and your data remains secure. Compare against benchmarks without exposing sensitive information.
Before you run your evaluation:
What to expect
Quick reference so you can explain the metrics before anyone clicks into the Analyzer.
Δ = self-rating − reviewer score. Keep cohorts within ±5 to prove confidence ≈ accuracy.
Log mental demand + frustration immediately after each run; it takes under 15 seconds but surfaces fatigue trends.
Counter-balance order, lock the rubric, and compare manual vs. AI time so the story survives scrutiny.
Review checklistAggregated deltas, TLX trends, and Cronbach’s α—never raw prompts or personal data.
Individual runs or annotations never leave your workspace.
Next steps
Want formulas and audit steps? Review the full methodology. Go to methodology.
Statistical limitations:
Measurement validity boundaries:
Privacy model constraints:
Key point: These measurements are diagnostic tools, not performance evaluations. Use them to identify improvement opportunities, not to rank individuals. Review evidence standards →
Bring these along when Legal or Compliance wants a quick briefing.
What data leaves my workspace?
Only anonymized deltas, TLX trends, and reviewer medians so execs can compare cohorts without raw runs.
How are reviewer notes protected?
Reviewer evidence stays scoped to your workspace with access controls and export logging.
How do I brief leadership on validity?
Pair this page with Methodology, include Δ + TLX tiles, and link the interpretation guide for instant context.
Open Beta
Run the analyzer demo, share methodology notes with your team, and send us benchmarks so the release ships with proof—not hype.