What Judgment Score measures
Your Judgment Score reflects how well you evaluate AI outputs—not how well AI performs for you. High judgment = you catch errors, calibrate predictions, and assess quality accurately. Low judgment = you trust AI too much or too little.
Score Components
1. Prediction Accuracy (40% of score)
What it measures: How close your pre-evaluation quality predictions are to actual scores.
How it's calculated:
- Before reviewing AI output, you predict quality (1-10)
- After evaluation, you score it properly
- Delta between prediction and actual = Prediction Δ
- Lower Δ = higher component score
How to improve:
- Practice prediction before every AI output review
- Track patterns: Do you over- or under-estimate?
- Calibrate by task type (you may be accurate on code, not on strategy)
2. Bias Detection (30% of score)
What it measures: How reliably you identify bias in AI outputs.
How it's calculated:
- You review outputs with known biases (seeded during assessment)
- Detection rate = biases you caught / biases present
- Higher detection rate = higher component score
How to improve:
- Use the 7-point bias checklist
- Practice on outputs outside your expertise (bias is easier to spot when you're not anchored)
- Run adversarial prompts (red-teaming guide)
3. Quality Assessment (30% of score)
What it measures: How consistently your quality scores align with expert benchmarks.
How it's calculated:
- You score AI outputs on standard criteria
- Your scores are compared to expert-validated scores
- Closer alignment = higher component score
How to improve:
- Lock rubrics before evaluation (rubrics guide)
- Calibrate with peers (score same output independently, compare)
- Review expert-scored examples to calibrate your standards
Understanding Your Score
| Score Range | Level | What It Means | |-------------|-------|---------------| | 80-100 | Expert | You reliably catch issues stakeholders would find | | 60-79 | Proficient | Good judgment with occasional blind spots | | 40-59 | Developing | Consistent patterns to work on | | 20-39 | Novice | Building foundational evaluation skills | | 0-19 | Baseline | Starting point—room for rapid growth |
Improvement Strategies by Starting Score
If you score 60-79: Focus on Bias Detection
You're likely accurate on obvious issues. Work on:
- Subtle bias patterns (anchoring, omission)
- Adversarial thinking (what's the counter-argument?)
- Domain-specific blind spots
If you score 40-59: Focus on Prediction Accuracy
You're probably over- or under-estimating consistently. Work on:
- Logging predictions before every evaluation
- Identifying your systematic error direction
- Calibrating by task type
If you score under 40: Focus on Quality Assessment
Build your evaluation foundation. Work on:
- Understanding what "good" looks like for different outputs
- Using structured rubrics instead of gut feeling
- Reviewing expert-scored examples
- ✓Know your current Judgment Score and component breakdown
- ✓Identify lowest component—that's your fastest lever
- ✓Practice weekly on Judgment Pack or any assessment
- ✓Track score trend over 4 weeks—expect 15-20% improvement
- ✓Calibrate with peers monthly to check blind spots
Weekly Practice Protocol
Time commitment: 20-30 minutes per week
- Monday: Run one Judgment Pack (~15 min)
- Wednesday: Review your previous scores; identify one pattern
- Friday: Apply one technique from this guide to real work
- End of month: Compare scores week 1 vs week 4
“"My Judgment Score went from 52 to 71 in six weeks. The biggest jump came when I started logging predictions—I was consistently overestimating by +3."”
Related Resources
- Judgment Pack — assessment that builds judgment skills
- How to Read Your Results — interpret your scores
Apply this now
Practice prompt
Run a Judgment Pack this week and note your component scores.
Try this now
Identify your lowest component score and pick one technique to practice.
Common pitfall
Trying to improve everything at once—focus on one component per month.
Key takeaways
- •Judgment Score = Prediction Accuracy (40%) + Bias Detection (30%) + Quality Assessment (30%)
- •Identify your lowest component—that's where improvement is fastest
- •Weekly practice moves scores 15-20% in 4 weeks
See it in action
Drop this into a measured run—demo it, then tie it back to your methodology.
See also
Pair this play with related resources, methodology notes, or quickstarts.
Further reading
Next Steps
Ready to measure your AI impact? Start with a quick demo to see your Overestimation Δ and cognitive load metrics.