Challenges and Future Directions of Data-Centric AI Alignment

Min-Hsuan Yeh, Jeffrey Wang, Xuefeng Du, Seongheon Park, Leitian Tao, Shawn Im, … (+1 more) — 2025-05-01 — arXiv

Summary

Position paper advocating for data-centric AI alignment, identifying six sources of unreliability in human feedback through qualitative analysis of 160 Anthropic-HH samples, and proposing seven research directions for improving feedback collection, data cleaning, and verification processes.

Key Result

Found 25% of re-annotated samples contradicted original labels and another 25% were marked ‘both are bad’, with Fleiss’s kappa of 0.46 indicating moderate inter-annotator agreement and six distinct sources of unreliability.

Source

Link: https://arxiv.org/html/2410.01957v2
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- data-quality-for-alignment — Black-box safety (understand and control current model behaviour) / Better data

data-quality-for-alignment

AI Safety Compendium

Explorer

Challenges and Future Directions of Data-Centric AI Alignment

Challenges and Future Directions of Data-Centric AI Alignment

Summary

Key Result

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

Challenges and Future Directions of Data-Centric AI Alignment

Challenges and Future Directions of Data-Centric AI Alignment

Summary

Key Result

Source

Related Pages

Graph View

Graph view

Table of Contents