AI Safety Compendium

❯

WildTeaming at Scale: From In the Wild Jailbreaks to (Adversarially) Safer Language Models

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

27 Apr 20261 min read

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

Liwei Jiang, Kavel Rao, Seungju Han, Allyson Ettinger, Faeze Brahman, Sachin Kumar, … (+5 more) — 2024-06-26 — University of Washington, Allen Institute for AI

Source

Link: https://arxiv.org/pdf/2406.18510
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- various-redteams — Evals

Related Pages

various-redteams

Graph View

Graph view

The interactive citation graph is desktop-only. Visit this page on a larger screen to explore how concepts, agendas, papers, and organisations link together.

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
Source
Related Pages

Suggest a source
Connect
Overview
About (proof of concept)
Email feedback
Made by IT for Humanity

AI Safety Compendium

Explorer

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models

Source

Graph View

Graph view

Table of Contents