New website analyzing AI companies’ model evals

Zach Stein-Perlman — 2025-05-26 — LessWrong

Summary

Announces a website analyzing AI companies’ dangerous capability evaluations and provides technical critique of recent model cards from OpenAI, Anthropic, DeepMind, Meta, and xAI, identifying methodological problems in elicitation, interpretation, and threshold-setting.

Source