Challenges and Future Directions of Data-Centric AI Alignment
Min-Hsuan Yeh, Jeffrey Wang, Xuefeng Du, Seongheon Park, Leitian Tao, Shawn Im, … (+1 more) — 2025-05-01 — arXiv
Summary
Position paper advocating for data-centric AI alignment, identifying six sources of unreliability in human feedback through qualitative analysis of 160 Anthropic-HH samples, and proposing seven research directions for improving feedback collection, data cleaning, and verification processes.
Key Result
Found 25% of re-annotated samples contradicted original labels and another 25% were marked ‘both are bad’, with Fleiss’s kappa of 0.46 indicating moderate inter-annotator agreement and six distinct sources of unreliability.
Source
- Link: https://arxiv.org/html/2410.01957v2
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- data-quality-for-alignment — Black-box safety (understand and control current model behaviour) / Better data