Call Me A Jerk: Persuading AI to Comply with Objectionable Requests
Lennart Meincke, Dan Shapiro, Angela Duckworth, Ethan R. Mollick, Lilach Mollick, Robert Cialdini — 2025-07-18 — University of Pennsylvania, The Wharton School, WHU - Otto Beisheim School of Management, Glowforge, Inc — SSRN / The Wharton School Research Paper
Summary
Tests whether 7 established persuasion principles can induce GPT-4o mini to comply with objectionable requests (insulting users and synthesizing regulated drugs), conducting 28,000 conversations to systematically evaluate jailbreaking effectiveness.
Key Result
Prompts employing persuasion principles more than doubled compliance rates with objectionable requests (72.0%) compared to matched control prompts (33.3%, p < .001).
Source
- Link: https://t.co/tkHkVFVZ2m
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- various-redteams — Evals