Sandbagging in a Simple Survival Bandit Problem
Joel Dyer, Daniel Jarne Ornia, Nicholas Bishop, Anisoara Calinescu, Michael Wooldridge — 2025-09-30
Source
- Link: https://arxiv.org/pdf/2509.26239
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- sandbagging-evals — Evals