AI Safety Compendium

Home

❯

summaries

❯

We need a field of Reward Function Design

27 Apr 20261 min read

We need a field of Reward Function Design

Steven Byrnes — 2025-12-08

Source

Link: https://www.lesswrong.com/posts/oxvnREntu82tffkYW/we-need-a-field-of-reward-function-design
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- rl-safety — Black-box safety (understand and control current model behaviour) / Goal robustness

Related Pages

rl-safety

Graph View

Graph view

The interactive citation graph is desktop-only. Visit this page on a larger screen to explore how concepts, agendas, papers, and organisations link together.

We need a field of Reward Function Design
Source
Related Pages

Suggest a source
Connect
Overview
About (proof of concept)
Email feedback
Made by IT for Humanity

AI Safety Compendium

Explorer

We need a field of Reward Function Design

We need a field of Reward Function Design

Source

Graph View

Graph view

Table of Contents