Skip to main content
Open BetaWe’re learning fast - your sessions and feedback directly shape AI CogniFit.
Team Tool
3 min read

AI Evaluation Team Retro Template

Copy-paste agenda for reviewing AI tool performance with your team

60-Minute Team Retro Agenda

Use this template after completing AI evaluations to review results and plan improvements.

Pre-Meeting Prep (Send 24h before)

Email your team:

Subject: AI Evaluation Retro - [Date]

Team,

Please review attached Δ and TLX results before our retro.

Come prepared to discuss:
1. Your biggest surprise from the results
2. One process improvement idea
3. Tasks you'd prioritize for AI assistance

Meeting link: [Link]
Dashboard: [Link to results]

Thanks!

Meeting Agenda

1. Check-in (5 min)

Round-robin: "In one word, how do you feel about our AI evaluation results?"

2. Data Review (15 min)

Screen share the dashboard and review:

  • Overall Metrics

    • Average Overestimation Δ: ____%
    • Average TLX score: ____
    • Time saved: ____ hours
    • Quality maintained: Yes/No
  • Task Breakdown

    Task 1: [Name]
    - Manual: ___ min, Quality: ___%
    - AI: ___ min, Quality: ___%
    - Δ: ___%, TLX: ___
    
    Task 2: [Name]
    - Manual: ___ min, Quality: ___%
    - AI: ___ min, Quality: ___%
    - Δ: ___%, TLX: ___
    

3. What Worked (10 min)

Facilitate discussion:

  • Which tasks showed genuine time savings?
  • Where did quality improve or stay stable?
  • What surprised us positively?

Capture on shared board:

  • 🟢 Keep doing: _____________
  • 🟢 Keep doing: _____________
  • 🟢 Keep doing: _____________

4. What Didn't Work (10 min)

Facilitate discussion:

  • Where did we overestimate AI capabilities?
  • Which tasks increased cognitive load (TLX)?
  • What quality issues emerged?

Capture on shared board:

  • 🔴 Stop doing: _____________
  • 🔴 Stop doing: _____________
  • 🔴 Stop doing: _____________

5. Action Planning (15 min)

For each problem area, define:

  1. High Δ Tasks (>15%)

    • Action: Redefine success criteria
    • Owner: _______
    • Due: _______
  2. High TLX Tasks (>70)

    • Action: Simplify or provide training
    • Owner: _______
    • Due: _______
  3. Quality Drops (>5%)

    • Action: Add review checkpoints
    • Owner: _______
    • Due: _______

6. Experiment Design (5 min)

Next sprint experiments:

Experiment 1: [Description]
- Hypothesis:
- Success metric:
- Owner:

Experiment 2: [Description]
- Hypothesis:
- Success metric:
- Owner:

Follow-Up Actions

Immediately After Meeting

Send summary email:

Subject: Retro Summary - AI Evaluation [Date]

Team,

Thanks for the productive discussion. Key takeaways:

WINS:
• [Win 1]
• [Win 2]

IMPROVEMENTS:
• [Action 1] - Owner: [Name] - Due: [Date]
• [Action 2] - Owner: [Name] - Due: [Date]

NEXT EXPERIMENTS:
• [Experiment 1]
• [Experiment 2]

Next evaluation: [Date]

Dashboard: [Link]
Recording: [Link]

Weekly Check-ins

Add 5 min to standup:

  • "Any AI tool friction this week?"
  • "TLX feeling sustainable?"
  • "Noticing any quality issues?"

Monthly Review

  • Compare month-over-month Δ trends
  • Review TLX patterns for burnout signals
  • Celebrate improvements

Templates & Resources

Signs of Success

After 3 retros, you should see:

  • ✅ Δ trending down (team calibrating expectations)
  • ✅ TLX stabilizing (sustainable workload)
  • ✅ Quality maintained or improving
  • ✅ Clear task segmentation (AI-suitable vs manual)
PrivacyEthicsStatusOpen Beta Terms
Share feedback