Research Note: Our scheming precursor evals had limited predictive power for our in-context scheming evals

Marius Hobbhahn — 2025-07-03 — Apollo Research — LessWrong

Summary

Apollo Research empirically tested whether their precursor evaluations for scheming (measuring agentic self-reasoning and theory of mind) predicted their direct in-context scheming evaluations, finding low to medium predictive power that is insufficient for high-stakes deployment decisions.

Key Result

Precursor evaluations showed low to medium predictive power for scheming capabilities, with easy versions somewhat useful but hard versions neutral or actively misleading, demonstrating insufficient reliability for frontier safety policies.

Source