2404.10636 - What are human values, and how do we align AI to them?

Oliver Klingefjord, Ryan Lowe, Joe Edelman — 2024-04-17 — arXiv

Summary

Proposes Moral Graph Elicitation (MGE), a process using large language models to interview people about their values in specific contexts, then reconcile diverse views into alignment targets. Tests the approach with 500 Americans on divisive prompts, finding high participant satisfaction and natural emergence of expert perspectives.

Key Result

89.1% of participants felt well represented by the MGE process and 89% thought the final moral graph was fair, with expert values rising to the top without pre-defining who counts as an expert.

Source

Link: https://arxiv.org/abs/2404.10636
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- aligning-to-context — Multi-agent first
Editorial blurb (verbatim): [2404.10636 \- What are human values, and how do we align AI to them?](https://arxiv.org/abs/2404.10636)

aligning-to-context

AI Safety Compendium

Explorer

2404.10636 - What are human values, and how do we align AI to them?

2404.10636 - What are human values, and how do we align AI to them?

Summary

Key Result

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

2404.10636 - What are human values, and how do we align AI to them?

2404.10636 - What are human values, and how do we align AI to them?

Summary

Key Result

Source

Related Pages

Graph View

Graph view

Table of Contents