Neural Interactive Proofs

Lewis Hammond, Sam Adam-Day — 2024-12-08 — University of Oxford — ICLR 2025

Summary

Introduces neural interactive proofs - a game-theoretic framework enabling trusted weak models to interact with untrusted strong models to solve tasks beyond the weak model’s capabilities, with several new protocols (NIP, MNIP, zk-variants) and empirical evaluation on graph isomorphism and code validation tasks.

Key Result

New protocols (NIP and MNIP) outperform prior scalable oversight methods including debate and Merlin-Arthur classifiers on untrained models, though differences diminish with expert iteration training.

Source

Link: https://neural-interactive-proofs.com/
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- black-box-make-ai-solve-it — Black-box safety (understand and control current model behaviour) / Iterative alignment

black-box-make-ai-solve-it

AI Safety Compendium

Explorer

Neural Interactive Proofs

Neural Interactive Proofs

Summary

Key Result

Source

Graph View

Graph view

Table of Contents

Backlinks

AI Safety Compendium

Explorer

Neural Interactive Proofs

Neural Interactive Proofs

Summary

Key Result

Source

Related Pages

Graph View

Graph view

Table of Contents

Backlinks