Scalable Oversight for Superhuman AI via Recursive Self-Critiquing

Xueru Wen, Jie Lou, Xinyu Lu, Junjie Yang, Yanjiang Liu, Yaojie Lu, … (+2 more) — 2025-02-07 — arXiv

Summary

Investigates recursive self-critiquing as a method for scalable oversight of superhuman AI, testing the hypotheses that critique-of-critique is easier than direct critique and this relationship holds recursively through Human-AI and AI-AI experiments.

Key Result

Results demonstrate that recursive critique provides a promising and tractable supervision pathway when direct evaluation of superhuman AI outputs is infeasible.

Source