AI companies should be safety-testing the most capable versions of their models

Steven Adler — 2025-03-26 — LessWrong

Summary

Argues that AI companies should evaluate task-specific fine-tuned versions of their models to understand worst-case dangerous capabilities, critiquing current evaluation practices that may underestimate risks and examining OpenAI’s commitment to this approach.

Source