2404.10636 - What are human values, and how do we align AI to them?
Oliver Klingefjord, Ryan Lowe, Joe Edelman — 2024-04-17 — arXiv
Summary
Proposes Moral Graph Elicitation (MGE), a process using large language models to interview people about their values in specific contexts, then reconcile diverse views into alignment targets. Tests the approach with 500 Americans on divisive prompts, finding high participant satisfaction and natural emergence of expert perspectives.
Key Result
89.1% of participants felt well represented by the MGE process and 89% thought the final moral graph was fair, with expert values rising to the top without pre-defining who counts as an expert.
Source
- Link: https://arxiv.org/abs/2404.10636
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- aligning-to-context — Multi-agent first
- Editorial blurb (verbatim):
[2404.10636 \- What are human values, and how do we align AI to them?](https://arxiv.org/abs/2404.10636)