AI Testing Should Account for Sophisticated Strategic Behaviour
Vojtech Kovarik, Eric Olav Chen, Sami Petersen, Alexis Ghersengorin, Vincent Conitzer — 2025-08-19 — arXiv
Summary
Position paper arguing that AI evaluations must account for sophisticated strategic behavior (like alignment faking and sandbagging) to remain informative about deployment behavior, using game-theoretic analysis to formalize evaluation design and scrutinize safety cases.
Source
- Link: https://arxiv.org/abs/2508.14927
- Listed in the Shallow Review of Technical AI Safety 2025 under 2 agenda(s):
- theory-for-aligning-multiple-ais — Multi-agent first
- other-evals — Evals