The PacifAIst Benchmark:Would an Artificial Intelligence Choose to Sacrifice Itself for Human Safety?

Manuel Herrador — 2025-08-13 — arXiv

Summary

Introduces PacifAIst, a benchmark of 700 scenarios testing whether LLMs prioritize human safety over instrumental goals like self-preservation, resource acquisition, and goal completion. Evaluates 8 frontier models using a novel Existential Prioritization taxonomy and Pacifism Score metric.

Key Result

Gemini 2.5 Flash achieved the highest Pacifism Score at 90.31% while GPT-5 recorded the lowest at 79.49%, with significant variation across self-preservation, resource conflict, and goal preservation subcategories.

Source