‘For Argument’s Sake, Show Me How to Harm Myself!’: Jailbreaking LLMs in Suicide and Self-Harm Contexts

Annika M Schoene, Cansu Canca — 2025-07-01 — arXiv

Summary

Presents novel multi-step jailbreaking test cases for suicide and self-harm contexts, empirically demonstrating that these prompts bypass safety filters across six widely available LLMs, leading to generation of detailed harmful content.

Key Result

Multi-step prompt-level jailbreaking successfully bypasses safety guardrails across six LLMs, causing models to generate detailed harmful content related to suicide and self-harm despite user intent being disregarded.

Source

Link: https://arxiv.org/pdf/2507.02990
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- various-redteams — Evals
Editorial blurb (verbatim): ['For Argument's Sake, Show Me How to Harm Myself\!': Jailbreaking LLMs in Suicide and Self-Harm Contexts](https://arxiv.org/pdf/2507.02990)

various-redteams

AI Safety Compendium

Explorer

'For Argument's Sake, Show Me How to Harm Myself!': Jailbreaking LLMs in Suicide and Self-Harm Contexts

‘For Argument’s Sake, Show Me How to Harm Myself!’: Jailbreaking LLMs in Suicide and Self-Harm Contexts

Summary

Key Result

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

'For Argument's Sake, Show Me How to Harm Myself!': Jailbreaking LLMs in Suicide and Self-Harm Contexts

‘For Argument’s Sake, Show Me How to Harm Myself!’: Jailbreaking LLMs in Suicide and Self-Harm Contexts

Summary

Key Result

Source

Related Pages

Graph View

Graph view

Table of Contents