Early Signs of Steganographic Capabilities in Frontier LLMs

Artur Zolkowski, Kei Nishimura-Gasparian, Robert McCarthy, Roland S. Zimmermann, David Lindner — 2025-07-03 — arXiv

Summary

Evaluates steganographic capabilities in frontier LLMs, testing their ability to encode hidden messages and perform encoded reasoning to evade monitoring systems.

Key Result

Current LLMs cannot reliably encode messages under standard affordances but show nascent capabilities with additional affordances like unmonitored scratchpads, and exhibit early signs of basic encoded reasoning though they rarely hide reasoning subtly enough to fool monitors.

Source