Season Zero Active

Human AI Deception
Benchmark Arena

A living benchmark measuring how well humans detect AI errors, hallucinations, and reasoning flaws. Contribute your judgment skills and see how you compare.

Start Your Baseline Enter the Arena

Human Performance

Track how well humans detect AI mistakes across diverse scenarios

Error Patterns

Identify common blindspots where humans struggle to catch AI failures

Skill Growth

Watch your judgment improve over time with structured practice

The Human AI Deception Benchmark measures human ability to detect AI failures across five skill pillars:

AI Literacy

Understanding how AI systems work and why they fail

Logic & Reasoning

Spotting flawed arguments and reasoning chains

Risk & Safety

Assessing when AI outputs are safe to use

Authenticity Detection

Distinguishing human vs AI-generated content

Calibration

Knowing when you're confident vs. uncertain in your judgments

We also measure response time and confidence calibration to understand how people make judgment calls under pressure.

Ready to Contribute?

Your participation helps build the most comprehensive benchmark of human AI judgment. Start your journey today.

Start Your Baseline Learn More About the Arena

Human AI Deception
Benchmark Arena

Human Performance

Error Patterns

Skill Growth

What We're Measuring

AI Literacy

Logic & Reasoning

Risk & Safety

Authenticity Detection

Calibration

How Data Is Collected

How Seasons Work

Privacy & Data Use

What Insights We Generate

How to Participate

Ready to Contribute?

Human AI Deception Benchmark Arena

Human Performance

Error Patterns

Skill Growth

What We're Measuring

AI Literacy

Logic & Reasoning

Risk & Safety

Authenticity Detection

Calibration

How Data Is Collected

How Seasons Work

Privacy & Data Use

What Insights We Generate

How to Participate

Ready to Contribute?

Human AI Deception
Benchmark Arena