Generative Value Conflicts Reveal LLM Priorities

Andy Liu, Kshitish Ghate, Mona Diab, Daniel Fried, Atoosa Kasirzadeh, Max Kleiman-Weiner — 2025-09-29 — arXiv

Summary

Introduces ConflictScope, an automatic pipeline to evaluate how LLMs prioritize different values by generating scenarios with value conflicts and analyzing model responses to elicit value rankings.

Key Result

Models shift from supporting protective values (harmlessness) toward personal values (user autonomy) in open-ended settings, but including detailed value orderings in system prompts improves alignment with target rankings by 14%.

Source

Link: https://arxiv.org/abs/2509.25369
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- capability-evals — Evals

capability-evals

AI Safety Compendium

Explorer

Generative Value Conflicts Reveal LLM Priorities

Generative Value Conflicts Reveal LLM Priorities

Summary

Key Result

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

Generative Value Conflicts Reveal LLM Priorities

Generative Value Conflicts Reveal LLM Priorities

Summary

Key Result

Source

Related Pages

Graph View

Graph view

Table of Contents