Research Quality & Standards
How we grade research confidence for AI evaluation
Making Informed Decisions with Graded Research
Not all research is created equal. Our A/B/C grading system helps you understand the confidence level behind every claim, metric, and recommendation we provide, ensuring you can match research quality to decision risk.
Strong Research
Gold standard for strategic decisions
Strategic commitments, major investments, policy decisions
Learn moreModerate Research
Industry-validated for pilots
Pilot programs, controlled rollouts, team experiments
Learn moreLimited Research
Emerging insights for exploration
Exploration, hypothesis generation, early research
Learn moreHow to Use Research Levels Safely
Best practices for decision-making
Best Practices
- Match research level to decision risk: A for strategic, B for pilots, C for exploration
- Check publication date: AI research >12 months may be outdated
- Consider your context: Validate findings match your industry and scale
- Seek converging findings from multiple sources
Common Mistakes to Avoid
- ✕Using Level C research alone for major investments
- ✕Ignoring contradictory findings from different sources
- ✕Generalizing from single-vendor case studies
- ✕Applying old findings to rapidly-evolving AI capabilities
How We Assign Research Levels
Our scientific grading methodology
Our research grading follows established scientific standards adapted for AI evaluation contexts. Each claim receives a grade based on these five key factors:
Study Design
RCTs and systematic reviews earn higher grades
Sample Size
Larger, more representative samples increase confidence
Reproducibility
Findings replicated across contexts receive higher ratings
Quantitative Rigor
Statistical significance and effect sizes matter
Recency
Recent research (especially for AI) carries more weight
Research Level Definitions
Strong Research
Gold standard for strategic decisions
Qualifying Criteria
- Randomized controlled trials with 50+ participants
- Systematic reviews with consistent results
- Peer-reviewed research with replication
- Statistical significance (p < 0.05)
Example Studies
Moderate Research
Industry-validated for pilots
Qualifying Criteria
- Controlled studies with 20-50 participants
- Industry benchmarks from reputable sources
- Case studies with quantitative outcomes
- Expert consensus from domain authorities
Example Studies
Limited Research
Emerging insights for exploration
Qualifying Criteria
- Pilot studies with <20 participants
- Self-reported outcomes without controls
- Anecdotal findings from practitioners
- Theoretical frameworks awaiting validation