Adversarial Alignment for LLMs Requires Simpler, Reproducible, and More Measurable Objectives

Leo Schwinn, Yan Scholten, Tom Wollschläger, Sophie Xhonneux, Stephen Casper, Stephan Günnemann, … (+1 more) — 2025-02-17 — arXiv

Summary

Position paper arguing that adversarial robustness research for LLMs risks repeating mistakes from past adversarial ML research, proposing a cybersecurity-based taxonomy and calling for realigned research objectives focused on measurability, reproducibility, and comparability.

Source