SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents

Agustín Martinez Suñé, Tan Zhi Xuan — PIBBSS, MIT, University of Oxford — Manifund

Summary

Develops SafePlanBench, a benchmark to evaluate LLM-based agents on safe planning by using PDDL symbolic planning to enforce safety constraints, testing whether LLMs can translate natural language into formal specifications that guarantee safety.

Source

Link: https://manifund.org/projects/safeplanbench-evaluating-a-guaranteed-safe-ai-approach-for-llm-based-agents
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- guaranteed-safe-ai — Safety by construction

guaranteed-safe-ai

AI Safety Compendium

Explorer

SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents

SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents

Summary

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents

SafePlanBench: evaluating a Guaranteed Safe AI Approach for LLM-based Agents

Summary

Source

Related Pages

Graph View

Graph view

Table of Contents