Debate Helps Weak-to-Strong Generalization

Hao Lang, Fei Huang, Yongbin Li — 2025-01-21 — arXiv

Summary

Empirically tests whether debate can improve weak-to-strong generalization by having strong models assist weak models during training, then using those enhanced weak models to supervise the strong models on OpenAI’s weak-to-strong NLP benchmarks.

Key Result

Debate enables weak models to extract trustworthy information from strong models for supervision, and ensembles of weak models better exploit debate arguments, leading to improved alignment on weak-to-strong benchmarks.

Source