Iterative Label Refinement Matters More than Preference Optimization under Weak Supervision

Yaowen Ye, Cassidy Laidlaw, Jacob Steinhardt — 2025-01-14 — UC Berkeley — arXiv

Summary

Proposes Iterative Label Refinement (ILR) as an alternative to RLHF for aligning language models under unreliable supervision, using comparison feedback to improve training data quality rather than directly training the model, and demonstrates superior performance over DPO on math, coding, and safe instruction-following tasks.

Key Result

SFT+ILR outperforms SFT+DPO on tasks with unreliable supervision, showing that directing feedback toward improving training data is more effective than continual model training when human supervision is unreliable.

Source