“The Era of Experience” has an unsolved technical alignment problem

Steven Byrnes — 2025-04-24 — LessWrong

Summary

Technical critique of Silver & Sutton’s alignment proposal for future RL agents, arguing their reward function approach fails due to specification gaming and goal misgeneralization that cannot be easily patched.

Source