Open Character Training: Shaping the Persona of AI Assistants through Constitutional AI
Sharan Maiya, Henning Bartsch, Nathan Lambert, Evan Hubinger — 2025-11-03 — Anthropic
Source
- Link: https://arxiv.org/pdf/2511.01689%20
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- character-training-and-persona-steering — Black-box safety (understand and control current model behaviour) / Model psychology