Evaluating the Goal-Directedness of Large Language Models

Tom Everitt, Cristina Garbacea, Alexis Bellot, Jonathan Richens, Henry Papadatos, Siméon Campos, … (+1 more) — 2025-04-16 — Google DeepMind, OpenAI, Anthropic — arXiv

Summary

Proposes a framework for measuring goal-directedness in LLMs - defined as the extent to which models use their capabilities toward given goals - and evaluates frontier models from Google DeepMind, OpenAI, and Anthropic on tasks requiring information gathering, cognitive effort, and plan execution.

Key Result

Goal-directedness is relatively consistent across tasks, differs from task performance, is only moderately sensitive to motivational prompts, and most models are not fully goal-directed.

Source

Link: https://arxiv.org/abs/2504.11844
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- capability-evals — Evals

capability-evals

AI Safety Compendium

Explorer

Evaluating the Goal-Directedness of Large Language Models

Evaluating the Goal-Directedness of Large Language Models

Summary

Key Result

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

Evaluating the Goal-Directedness of Large Language Models

Evaluating the Goal-Directedness of Large Language Models

Summary

Key Result

Source

Related Pages

Graph View

Graph view

Table of Contents