Skip to main content
Open BetaWe’re learning fast - your sessions and feedback directly shape AI CogniFit.

Research Quality & Standards

How we grade research confidence for AI evaluation

Research Standards

Making Informed Decisions with Graded Research

Not all research is created equal. Our A/B/C grading system helps you understand the confidence level behind every claim, metric, and recommendation we provide, ensuring you can match research quality to decision risk.

Research Quality Pyramid showing A, B, C levels
Scientific Framework
Research A

Strong Research

Gold standard for strategic decisions

Strategic commitments, major investments, policy decisions

Learn more
Research B

Moderate Research

Industry-validated for pilots

Pilot programs, controlled rollouts, team experiments

Learn more
Research C

Limited Research

Emerging insights for exploration

Exploration, hypothesis generation, early research

Learn more

How to Use Research Levels Safely

Best practices for decision-making

Best Practices

  • Match research level to decision risk: A for strategic, B for pilots, C for exploration
  • Check publication date: AI research >12 months may be outdated
  • Consider your context: Validate findings match your industry and scale
  • Seek converging findings from multiple sources

Common Mistakes to Avoid

  • Using Level C research alone for major investments
  • Ignoring contradictory findings from different sources
  • Generalizing from single-vendor case studies
  • Applying old findings to rapidly-evolving AI capabilities

How We Assign Research Levels

Our scientific grading methodology

Our research grading follows established scientific standards adapted for AI evaluation contexts. Each claim receives a grade based on these five key factors:

Study Design

RCTs and systematic reviews earn higher grades

Sample Size

Larger, more representative samples increase confidence

Reproducibility

Findings replicated across contexts receive higher ratings

Quantitative Rigor

Statistical significance and effect sizes matter

Recency

Recent research (especially for AI) carries more weight

Research Level Definitions

Research A

Strong Research

Gold standard for strategic decisions

Strategic commitments

Qualifying Criteria

  • Randomized controlled trials with 50+ participants
  • Systematic reviews with consistent results
  • Peer-reviewed research with replication
  • Statistical significance (p < 0.05)

Example Studies

Hart & Staveland (1988): NASA-TLX Development
Brynjolfsson et al. (2023): NBER study of 5,000+ agents
Dell'Acqua et al. (2023): BCG/Harvard study with 758 consultants
Limitations: May not generalize to all contexts or populations
Research B

Moderate Research

Industry-validated for pilots

Pilot programs

Qualifying Criteria

  • Controlled studies with 20-50 participants
  • Industry benchmarks from reputable sources
  • Case studies with quantitative outcomes
  • Expert consensus from domain authorities

Example Studies

McKinsey (2024): AI State Survey of 1,363 organizations
Microsoft (2024): Work Trend Index, 31,000 workers
Google Cloud (2024): 2,500 IT decision makers
Limitations: Results may vary based on organizational context
Research C

Limited Research

Emerging insights for exploration

Exploration

Qualifying Criteria

  • Pilot studies with <20 participants
  • Self-reported outcomes without controls
  • Anecdotal findings from practitioners
  • Theoretical frameworks awaiting validation

Example Studies

Initial Overestimation Delta measurements (15 users)
Early adopter case studies from 3 startups
Practitioner frameworks pending validation
Limitations: Treat as directional guidance only
PrivacyEthicsStatusOpen Beta Terms
Share feedback