AI Testing Should Account for Sophisticated Strategic Behaviour

Vojtech Kovarik, Eric Olav Chen, Sami Petersen, Alexis Ghersengorin, Vincent Conitzer — 2025-08-19 — arXiv

Summary

Position paper arguing that AI evaluations must account for sophisticated strategic behavior (like alignment faking and sandbagging) to remain informative about deployment behavior, using game-theoretic analysis to formalize evaluation design and scrutinize safety cases.

Source

Link: https://arxiv.org/abs/2508.14927
Listed in the Shallow Review of Technical AI Safety 2025 under 2 agenda(s):
- theory-for-aligning-multiple-ais — Multi-agent first
- other-evals — Evals

theory-for-aligning-multiple-ais
other-evals

AI Safety Compendium

Explorer

AI Testing Should Account for Sophisticated Strategic Behaviour

AI Testing Should Account for Sophisticated Strategic Behaviour

Summary

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

AI Testing Should Account for Sophisticated Strategic Behaviour

AI Testing Should Account for Sophisticated Strategic Behaviour

Summary

Source

Related Pages

Graph View

Graph view

Table of Contents