Do reasoning models use their scratchpad like we do? Evidence from distilling paraphrases

2025 — Anthropic — Anthropic Alignment Science Blog

Summary

Tests whether Claude 3.7 Sonnet uses encoded reasoning (steganography) in its chain-of-thought scratchpads by paraphrasing scratchpads to remove syntax-based information and measuring whether distilling these paraphrased scratchpads degrades performance.

Key Result

Pass@1 accuracy on MATH remains nearly identical when fine-tuning on paraphrased versus original scratchpads, suggesting Claude 3.7 Sonnet does not rely on encoded reasoning in its scratchpads for math problems.

Source