Skip to main content
Open BetaWe’re learning fast - your sessions and feedback directly shape AI CogniFit.
Executive Brief
2 min read

How to Read Δ & TLX Tiles

Executive guide to interpreting Overestimation Delta and Task Load Index results in 3 minutes

The One-Page Executive Summary

When your team presents AI evaluation results, you'll see two key metrics that tell the whole story:

1. Overestimation Delta (Δ)

What it measures: The gap between what teams think AI accomplished versus what it actually delivered.

  • Green (0-5%): Team accurately estimates AI capabilities
  • Yellow (5-15%): Mild overconfidence, needs calibration
  • Red (>15%): Significant overestimation, risk of quality issues

What to ask: "What specific tasks showed the highest Δ, and what's our mitigation plan?"

2. Task Load Index (TLX)

What it measures: Mental workload across six dimensions (mental, physical, temporal, performance, effort, frustration).

  • Low (0-40): Task feels manageable, team can sustain pace
  • Medium (40-70): Increased effort required, monitor for fatigue
  • High (70-100): Unsustainable workload, burnout risk

What to ask: "Which TLX dimensions spiked, and how are we addressing them?"

Reading the Tiles

When you see evaluation tiles in presentations or dashboards:

┌─────────────────────┐  ┌─────────────────────┐
│ Manual Baseline     │  │ AI-Assisted         │
│ Time: 45 min        │  │ Time: 28 min        │
│ Quality: 92%        │  │ Quality: 87%        │
│ TLX: 35             │  │ TLX: 52             │
└─────────────────────┘  └─────────────────────┘
                ↓
         Δ = +8% overestimation
         (Claimed 40% faster, delivered 38%)

Key Questions for Your Team

  1. Efficiency vs Quality Trade-off: "Is the time savings worth the quality drop?"
  2. Sustainability Check: "Can the team maintain this TLX level long-term?"
  3. Evidence Basis: "How many task attempts support these numbers?"

Action Triggers

Green Light (Continue):

  • Δ < 5% AND Quality maintained AND TLX < 60

Yellow Light (Monitor):

  • Δ 5-15% OR Quality drop < 5% OR TLX 60-70

Red Light (Intervene):

  • Δ > 15% OR Quality drop > 10% OR TLX > 70

Next Steps

After reviewing tiles, direct your team to:

  1. If results are good: Scale gradually with weekly Δ/TLX monitoring
  2. If results are mixed: Run targeted experiments on problem areas
  3. If results are poor: Revert to manual process and retrain

Learn More

PrivacyEthicsStatusOpen Beta Terms
Share feedback