Prover-Estimator Debate: A New Scalable Oversight Protocol

Jonah Brown-Cohen, Geoffrey Irving — 2025-06-17 — UK AISI — LessWrong / AI Alignment Forum

Summary

Presents a novel debate protocol where a prover decomposes claims and an estimator provides probabilities, with formal proofs that honest behavior is incentivized at equilibrium even when debaters have similar compute, addressing the obfuscated arguments problem in scalable oversight.

Key Result

Proves that under a stability assumption, the protocol incentivizes honest equilibrium behavior without requiring exponentially more compute for the honest debater, unlike prior debate protocols.

Source

Link: https://lesswrong.com/posts/8XHBaugB5S3r27MG9/prover-estimator-debate-a-new-scalable-oversight-protocol
Listed in the Shallow Review of Technical AI Safety 2025 under 2 agenda(s):
- black-box-make-ai-solve-it — Black-box safety (understand and control current model behaviour) / Iterative alignment
- debate — Make AI solve it

black-box-make-ai-solve-it
debate

AI Safety Compendium

Explorer

Prover-Estimator Debate: A New Scalable Oversight Protocol

Prover-Estimator Debate: A New Scalable Oversight Protocol

Summary

Key Result

Source

Graph View

Graph view

Table of Contents

Backlinks

AI Safety Compendium

Explorer

Prover-Estimator Debate: A New Scalable Oversight Protocol

Prover-Estimator Debate: A New Scalable Oversight Protocol

Summary

Key Result

Source

Related Pages

Graph View

Graph view

Table of Contents

Backlinks