Research Note: Our scheming precursor evals had limited predictive power for our in-context scheming evals

Marius Hobbhahn — 2025-07-03 — Apollo Research — LessWrong

Summary

Apollo Research empirically tested whether their precursor evaluations for scheming (measuring agentic self-reasoning and theory of mind) predicted their direct in-context scheming evaluations, finding low to medium predictive power that is insufficient for high-stakes deployment decisions.

Key Result

Precursor evaluations showed low to medium predictive power for scheming capabilities, with easy versions somewhat useful but hard versions neutral or actively misleading, demonstrating insufficient reliability for frontier safety policies.

Source

Link: https://lesswrong.com/posts/9tqpPP4FwSnv9AWsi/research-note-our-scheming-precursor-evals-had-limited
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- capability-evals — Evals

capability-evals

AI Safety Compendium

Explorer

Research Note: Our scheming precursor evals had limited predictive power for our in-context scheming evals

Research Note: Our scheming precursor evals had limited predictive power for our in-context scheming evals

Summary

Key Result

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

Research Note: Our scheming precursor evals had limited predictive power for our in-context scheming evals

Research Note: Our scheming precursor evals had limited predictive power for our in-context scheming evals

Summary

Key Result

Source

Related Pages

Graph View

Graph view

Table of Contents