Early Signs of Steganographic Capabilities in Frontier LLMs

Artur Zolkowski, Kei Nishimura-Gasparian, Robert McCarthy, Roland S. Zimmermann, David Lindner — 2025-07-03 — arXiv

Summary

Evaluates steganographic capabilities in frontier LLMs, testing their ability to encode hidden messages and perform encoded reasoning to evade monitoring systems.

Key Result

Current LLMs cannot reliably encode messages under standard affordances but show nascent capabilities with additional affordances like unmonitored scratchpads, and exhibit early signs of basic encoded reasoning though they rarely hide reasoning subtly enough to fool monitors.

Source

Link: https://arxiv.org/abs/2507.02737
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- steganography-evals — Evals

steganography-evals

AI Safety Compendium

Explorer

Early Signs of Steganographic Capabilities in Frontier LLMs

Early Signs of Steganographic Capabilities in Frontier LLMs

Summary

Key Result

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

Early Signs of Steganographic Capabilities in Frontier LLMs

Early Signs of Steganographic Capabilities in Frontier LLMs

Summary

Key Result

Source

Related Pages

Graph View

Graph view

Table of Contents