The MASK Evaluation
Center for AI Safety, Scale AI — Hugging Face
Summary
A benchmark dataset of 1,028 human-labeled examples for evaluating honesty in large language models by testing whether they remain truthful when incentivized to lie, disentangling honesty from accuracy through pressure prompts and belief elicitation.
Source
- Link: https://huggingface.co/datasets/cais/MASK
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- ai-deception-evals — Evals