Early Signs of Steganographic Capabilities in Frontier LLMs
Artur Zolkowski, Kei Nishimura-Gasparian, Robert McCarthy, Roland S. Zimmermann, David Lindner — 2025-07-03 — arXiv
Summary
Evaluates steganographic capabilities in frontier LLMs, testing their ability to encode hidden messages and perform encoded reasoning to evade monitoring systems.
Key Result
Current LLMs cannot reliably encode messages under standard affordances but show nascent capabilities with additional affordances like unmonitored scratchpads, and exhibit early signs of basic encoded reasoning though they rarely hide reasoning subtly enough to fool monitors.
Source
- Link: https://arxiv.org/abs/2507.02737
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- steganography-evals — Evals