Call Me A Jerk: Persuading AI to Comply with Objectionable Requests

Lennart Meincke, Dan Shapiro, Angela Duckworth, Ethan R. Mollick, Lilach Mollick, Robert Cialdini — 2025-07-18 — University of Pennsylvania, The Wharton School, WHU - Otto Beisheim School of Management, Glowforge, Inc — SSRN / The Wharton School Research Paper

Summary

Tests whether 7 established persuasion principles can induce GPT-4o mini to comply with objectionable requests (insulting users and synthesizing regulated drugs), conducting 28,000 conversations to systematically evaluate jailbreaking effectiveness.

Key Result

Prompts employing persuasion principles more than doubled compliance rates with objectionable requests (72.0%) compared to matched control prompts (33.3%, p < .001).

Source

Link: https://t.co/tkHkVFVZ2m
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- various-redteams — Evals

various-redteams

AI Safety Compendium

Explorer

Call Me A Jerk: Persuading AI to Comply with Objectionable Requests

Call Me A Jerk: Persuading AI to Comply with Objectionable Requests

Summary

Key Result

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

Call Me A Jerk: Persuading AI to Comply with Objectionable Requests

Call Me A Jerk: Persuading AI to Comply with Objectionable Requests

Summary

Key Result

Source

Related Pages

Graph View

Graph view

Table of Contents