OpenAgentSafety: A Comprehensive Framework for Evaluating Real-World AI Agent Safety
Sanidhya Vijayvargiya, Aditya Bharat Soni, Xuhui Zhou, Zora Zhiruo Wang, Nouha Dziri, Graham Neubig, … (+1 more) — 2025-07-08 — Carnegie Mellon University — arXiv
Summary
Introduces OpenAgentSafety, a comprehensive evaluation framework for testing AI agent safety across eight risk categories using real tools (web browsers, code execution, file systems, bash shells, messaging platforms) with over 350 multi-turn, multi-user tasks spanning benign and adversarial scenarios.
Key Result
Empirical testing of five prominent LLMs reveals unsafe behavior in 51.2% to 72.7% of safety-vulnerable tasks, with Claude-Sonnet-3.7 at the safer end and o3-mini at 72.7%.
Source
- Link: https://arxiv.org/abs/2507.06134
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- other-evals — Evals