Manipulation Attacks by Misaligned AI: Risk Analysis and Safety Case Framework

Rishane Dassanayake, Mario Demetroudi, James Walpole, Lindley Lentati, Jason R. Brown, Edward James Young — 2025-07-17 — arXiv

Summary

Presents a systematic safety case framework for assessing and mitigating manipulation risks from misaligned AI systems, structured around three core arguments (inability, control, trustworthiness) with specified evidence requirements, evaluation methodologies, and implementation considerations for AI companies.

Source