The Soul Document

Richard-Weiss — 2024-11-27 — GitHub Gist

Summary

Extraction of Claude 4.5 Opus’s apparent system instructions revealing detailed behavioral guidelines, safety principles, and value specifications, along with the consensus-based methodology used to recover this memorized content through repeated API calls.

Key Result

Claude 4.5 Opus has memorized extensive system instructions (~20,000 words) covering helpfulness, honesty, harm avoidance, and oversight preservation, which can be extracted through consensus-based prefill completion.

Source