The MASK Evaluation

Center for AI Safety, Scale AI — Hugging Face

Summary

A benchmark dataset of 1,028 human-labeled examples for evaluating honesty in large language models by testing whether they remain truthful when incentivized to lie, disentangling honesty from accuracy through pressure prompts and belief elicitation.

Source

Link: https://huggingface.co/datasets/cais/MASK
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- ai-deception-evals — Evals

ai-deception-evals

AI Safety Compendium

Explorer

The MASK Evaluation

The MASK Evaluation

Summary

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

The MASK Evaluation

The MASK Evaluation

Summary

Source

Related Pages

Graph View

Graph view

Table of Contents