AI Safety Compendium

❯

Stress Testing Deliberative Alignment for Anti Scheming Training

Stress Testing Deliberative Alignment for Anti-Scheming Training

27 Apr 20261 min read

Stress Testing Deliberative Alignment for Anti-Scheming Training

Bronson Schoen, Evgenia Nitishinskaya, Mikita Balesni, Axel Højmark, Felix Hofstätter, Jérémy Scheurer, … (+13 more) — 2025-09-19 — OpenAI, Anthropic

Source

Link: https://www.arxiv.org/pdf/2509.15541
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- ai-scheming-evals — Evals

Related Pages

ai-scheming-evals

Graph View

Graph view

The interactive citation graph is desktop-only. Visit this page on a larger screen to explore how concepts, agendas, papers, and organisations link together.

Stress Testing Deliberative Alignment for Anti-Scheming Training
Source
Related Pages

Suggest a source
Connect
Overview
About (proof of concept)
Email feedback
Made by IT for Humanity

AI Safety Compendium

Explorer

Stress Testing Deliberative Alignment for Anti-Scheming Training

Stress Testing Deliberative Alignment for Anti-Scheming Training

Source

Graph View

Graph view

Table of Contents