Giving AIs safe motivations
Joe Carlsmith — 2025-08-18 — Anthropic
Source
- Link: https://joecarlsmith.com/2025/08/18/giving-ais-safe-motivations#4-5-step-4-good-instructions
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- model-specs-and-constitutions — Black-box safety (understand and control current model behaviour) / Model psychology