METR
METR (Model Evaluation and Threat Research) is an AI evaluation organization formerly known as ARC Evals — the evaluations arm that spun out of the Alignment Research Center (ARC) founded by paul-christiano. METR focuses on assessing the capabilities and risks of frontier AI models, particularly evaluating whether models have reached dangerous capability thresholds.
Mission and Approach
METR’s core mission is to develop and conduct rigorous capability-evaluations for advanced AI systems. These evaluations go beyond standard benchmarks to specifically test for capabilities that could enable catastrophic harm — including autonomous replication, deception, resource acquisition, and the ability to resist human oversight.
The organization’s work is grounded in the insight — articulated by ajeya-cotra and others — that behavioral evaluation alone is insufficient for detecting deceptive-alignment. A deceptively aligned model will pass all behavioral tests. METR’s evaluation methodology therefore aims to probe deeper, testing for capabilities that would be necessary for a model to execute dangerous plans even if the model is strategically concealing its abilities during standard testing.
Relationship to ARC
METR originated as the evaluations team within ARC, paul-christiano’s organization. The spin-off reflects a growing recognition that capability evaluation is a distinct discipline requiring its own institutional focus — separate from (though complementary to) the alignment research that ARC continues to pursue.
Key People
- ajeya-cotra — Researcher at METR and former lead of technical AI safety grantmaking at open-philanthropy. Her work on deceptive-alignment and transformative-ai timelines directly informs METR’s evaluation priorities.
Role in the Safety Ecosystem
METR’s evaluations serve a critical function in safety frameworks like anthropic’s responsible-scaling-policy and similar policies at openai and deepmind. These frameworks depend on reliable capability assessments to determine when models have crossed dangerous thresholds requiring additional precautions. Without rigorous independent evaluation, the “if-then” commitments in responsible scaling policies lack a trustworthy trigger mechanism.
The organization represents the growing recognition that third-party evaluation of AI capabilities is a public good — analogous to safety testing in pharmaceuticals or aviation.