Neural Interactive Proofs

Lewis Hammond, Sam Adam-Day — 2024-12-08 — University of Oxford — ICLR 2025

Summary

Introduces neural interactive proofs - a game-theoretic framework enabling trusted weak models to interact with untrusted strong models to solve tasks beyond the weak model’s capabilities, with several new protocols (NIP, MNIP, zk-variants) and empirical evaluation on graph isomorphism and code validation tasks.

Key Result

New protocols (NIP and MNIP) outperform prior scalable oversight methods including debate and Merlin-Arthur classifiers on untrained models, though differences diminish with expert iteration training.

Source