Evaluating the Goal-Directedness of Large Language Models
Tom Everitt, Cristina Garbacea, Alexis Bellot, Jonathan Richens, Henry Papadatos, Siméon Campos, … (+1 more) — 2025-04-16 — Google DeepMind, OpenAI, Anthropic — arXiv
Summary
Proposes a framework for measuring goal-directedness in LLMs - defined as the extent to which models use their capabilities toward given goals - and evaluates frontier models from Google DeepMind, OpenAI, and Anthropic on tasks requiring information gathering, cognitive effort, and plan execution.
Key Result
Goal-directedness is relatively consistent across tasks, differs from task performance, is only moderately sensitive to motivational prompts, and most models are not fully goal-directed.
Source
- Link: https://arxiv.org/abs/2504.11844
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- capability-evals — Evals