A Safety Case for a Deployed LLM: Corrigibility as a Singular Target

Ram Potham — 2025-06-24 — ODYSSEY 2025 Conference

Summary

Presents a detailed safety case for deploying a highly capable LLM trained using the Corrigibility-as-Singular-Target (CAST) strategy via Prover-Estimator debate, arguing for adequate deployment specifications, bounded error rates, mitigated impact, and stability over a defined lifetime.

Source