Prover-Estimator Debate: A New Scalable Oversight Protocol

Jonah Brown-Cohen, Geoffrey Irving — 2025-06-17 — UK AISI — LessWrong / AI Alignment Forum

Summary

Presents a novel debate protocol where a prover decomposes claims and an estimator provides probabilities, with formal proofs that honest behavior is incentivized at equilibrium even when debaters have similar compute, addressing the obfuscated arguments problem in scalable oversight.

Key Result

Proves that under a stability assumption, the protocol incentivizes honest equilibrium behavior without requiring exponentially more compute for the honest debater, unlike prior debate protocols.

Source