Skip to main content
Open BetaWe’re learning fast - your sessions and feedback directly shape AI CogniFit.
Executive Brief
4 min read
Evidence: B

Executive AI Starter Kit

AI productivity claims are everywhere. Evidence is not. This kit gives you the three metrics that matter, the questions that expose delusions, and a 30-day plan to build measurement capability—before overconfidence becomes organizational debt.

Moderate Evidence: Based on NASA-TLX validation studies and organizational psychology research on overconfidence bias

The Three Metrics That Matter

Evidence Level B: Metrics derived from NASA-TLX (validated since 1988) and calibration research in judgment psychology.

1. Overestimation Delta (Δ)

What it is: The gap between what teams claim AI delivered versus what it actually delivered.

Why it matters: Teams overestimate AI productivity gains by 20-40% on average. Without measurement, overconfidence compounds sprint over sprint.

Healthy range: Δ under 10%. Above 15% signals systematic overconfidence requiring intervention.

2. Micro-TLX Score

What it is: A 2-slider workload check (mental demand + frustration) after AI-assisted tasks.

Why it matters: AI can save time while increasing cognitive load. If TLX climbs while time drops, you're trading visible efficiency for invisible burnout.

Healthy range: TLX under 50. Above 65 indicates unsustainable cognitive burden.

3. Time-to-Passed-Review

What it is: Elapsed time from task start to approval by a human reviewer.

Why it matters: AI-generated work often requires more revision cycles. Fast generation plus slow review equals no real savings.

Healthy range: Same or better than manual baseline. Longer review times signal quality debt.


Where Delusions Hide

  • "We're 3x faster" — Ask: Faster at generation or at passed review? Count revisions.
  • "Quality is the same" — Ask: Who measured? When? Against what rubric?
  • "Everyone loves it" — Ask: What's the TLX score? Enthusiasm ≠ sustainability.
  • "It works for everything" — Ask: Which tasks show negative Δ? There are always some.
  • "We don't need to track anymore" — Ask: When did you last measure? Drift happens fast.

30-Day Measurement Plan

Week 1: Baseline

  • [ ] Select 3 representative tasks (1 routine, 1 complex, 1 novel)
  • [ ] Run each task manually; record time and quality score
  • [ ] Collect TLX after each task
  • [ ] Document the rubric you used for quality

Deliverable: Baseline data for 3 tasks with time, quality, and TLX.

Week 2: AI Trials

  • [ ] Run same 3 tasks with AI assistance
  • [ ] Record claimed vs. actual time savings
  • [ ] Collect TLX immediately after each task
  • [ ] Calculate Δ for each task

Deliverable: Side-by-side comparison with Δ calculated.

Week 3: Analysis

  • [ ] Review which tasks showed positive Δ (under 10%)
  • [ ] Identify tasks where TLX increased despite time savings
  • [ ] Flag tasks where review cycles increased
  • [ ] Draft recommendations: expand, restrict, or retrain

Deliverable: Task classification (green/yellow/red) with rationale.

Week 4: Institutionalize

  • [ ] Add Δ tracking to sprint retrospectives
  • [ ] Create dashboard for ongoing TLX monitoring
  • [ ] Set thresholds for intervention triggers
  • [ ] Schedule monthly calibration review

Deliverable: Measurement system operational with clear owners.


Key Questions for Leadership

Use these in your next AI review meeting:

  1. "What's our average Δ across tracked tasks?"

    • If they can't answer, measurement isn't happening.
  2. "Which tasks show negative ROI when we include review time?"

    • Forces honest accounting of the full cycle.
  3. "What's the TLX trend over the last month?"

    • Catches burnout before it becomes turnover.
  4. "When did we last update our quality rubrics?"

    • Rubrics drift; AI output changes. Both need recalibration.

Apply Now


Further Reading


This kit aligns with the AI CogniFit Methodology and Validity Framework. All recommended metrics have been tested across multiple organizations.

Evidence & Methodology
PrivacyEthicsStatusOpen Beta Terms
Share feedback