Safe Learning Under Irreversible Dynamics via Asking for Help

Benjamin Plaut, Juan Liévano-Karim, Hanlin Zhu, Stuart Russell — 2025-02-19 — UC Berkeley — arXiv

Summary

Presents an algorithm with formal regret guarantees that enables safe reinforcement learning in environments with irreversible dynamics by allowing agents to ask mentors for help and transfer knowledge between similar states, achieving sublinear regret and mentor queries.

Key Result

First formal proof that an agent can obtain high reward while becoming self-sufficient in an unknown, unbounded, high-stakes environment without resets, with both regret and mentor queries sublinear in the time horizon.

Source