The MASK Evaluation

Center for AI Safety, Scale AI — Hugging Face

Summary

A benchmark dataset of 1,028 human-labeled examples for evaluating honesty in large language models by testing whether they remain truthful when incentivized to lie, disentangling honesty from accuracy through pressure prompts and belief elicitation.

Source