Understanding the Capabilities and Limitations of Weak-to-Strong Generalization

Wei Yao, Wenkai Yang, Ziqiao Wang, Yankai Lin, Yong Liu — 2025-03-08 — ICLR 2025 Workshop SSI-FM

Summary

Provides theoretical analysis of weak-to-strong generalization by establishing upper and lower bounds on generalization error and calibration error in classification settings, and extends work to regression with KL divergence loss, validated experimentally.

Key Result

Derives bounds showing that weak-to-strong generalization is primarily limited by the weak model’s generalization error and requires careful optimization balance to avoid over-reliance on weak supervision signals.

Source

Link: https://openreview.net/forum?id=RwYdLgj1S6
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- weak-to-strong-generalization — Make AI solve it

weak-to-strong-generalization

AI Safety Compendium

Explorer

Understanding the Capabilities and Limitations of Weak-to-Strong Generalization

Understanding the Capabilities and Limitations of Weak-to-Strong Generalization

Summary

Key Result

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

Understanding the Capabilities and Limitations of Weak-to-Strong Generalization

Understanding the Capabilities and Limitations of Weak-to-Strong Generalization

Summary

Key Result

Source

Related Pages

Graph View

Graph view

Table of Contents