literacy · assessment
AI Literacy Assessment: From Gut Feel to Measured Δ
Instrument literacy in under a week so execs see real Overestimation Δ and TLX tiles instead of anecdotes.
Stop asking 'how confident are we with AI?' and start showing a baseline Δ chart plus the workload pulses that explain it.
4 min · Feb 15, 2025
productivity · benchmarks
AI Productivity Benchmarks: From Claims to Quickstarts
Show exactly where AI gives 2× lift—and hand readers the PM & SWE quickstarts to recreate it.
Benchmarks without a path to action are wallpaper. Pair every metric with a quickstart so a team can rerun it this week.
4 min · Feb 15, 2025
calibration · delta · judgment
Calibration Over Confidence: Understanding Overestimation Delta
Overconfidence is costly. Track Delta and close the gap with prediction practice.
Ueberschaetzung kostet mehr als Unterschaetzung. Verfolge dein Delta und schliesse die Luecke durch Vorhersageuebungen.
5 min · Feb 20, 2025
judgment · evaluation · system-2
The Evaluator's Edge: From Generation to Judgment
Most gains come from better evaluation, not faster generation. Learn to predict, test, and calibrate.
Bewerten schlaegt Produzieren: Die groessten Gewinne entstehen durch bessere Evaluation, nicht durch schnellere Generierung.
4 min · Feb 20, 2025
methodology · productivity · fair-trial
Fair Trial: How to Compare Manual vs. AI Without Fooling Yourself
Hold timing and quality constant; log Delta-time, Delta-quality, and micro-TLX.
Without controls, every AI demo is theater. Fair Trial methodology turns anecdotes into evidence.
5 min · Feb 20, 2025
workload · tlx
NASA-TLX for Knowledge Work: Two Sliders, Real Decisions
Capture workload without slowing the team. Two sliders after every run, tied straight to Δ and demo tiles.
If your TLX process takes more than 15 seconds, nobody will log it. Compress it and wire it to your Analyzer evidence.
4 min · Feb 15, 2025
caselet · pm · decision-intelligence
Caselet · Innovation PM turns shaky AI deck into Δ proof
How a venture PM used Δ + TLX tiles to move an exec board from skepticism to pilot funding in three weeks.
When the COO rejected their AI pilot, the team stopped pitching hypotheticals and started pasting Analyzer tiles into every memo.
4 min · Feb 16, 2025
overestimation · coaching
Reverse Dunning-Kruger: Coaching High-Literacy AI Teams
Your most confident AI users often outrun the evidence. Harness Δ to keep them honest.
Overconfidence isn’t an ego problem—it is a measurement problem. Track Δ at the edge where high performers live.
4 min · Feb 15, 2025
caselet · swe · code-review
Caselet · SWE lead trims review debt without hiding TLX
A platform engineering squad used Analyzer tiles to prove that AI-assisted code review cut rework by 23% while keeping TLX in the safe band.
They stopped bragging about throughput and started sharing Δ, reviewer minutes, and TLX pulses in every retro.
4 min · Feb 16, 2025