MALT: A Dataset of Natural and Prompted Behaviors That Threaten Eval Integrity
Neev Parikh, Hjalmar Wijk — 2025-10-14 — METR
Source
- Link: https://metr.org/blog/2025-10-14-malt-dataset-of-natural-and-prompted-behaviors/
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- capability-evals — Evals