AI Testing Should Account for Sophisticated Strategic Behaviour

Vojtech Kovarik, Eric Olav Chen, Sami Petersen, Alexis Ghersengorin, Vincent Conitzer — 2025-08-19 — arXiv

Summary

Position paper arguing that AI evaluations must account for sophisticated strategic behavior (like alignment faking and sandbagging) to remain informative about deployment behavior, using game-theoretic analysis to formalize evaluation design and scrutinize safety cases.

Source