Liars’ Bench: Evaluating Lie Detectors for Language Models

Kieron Kretschmar, Walter Laurito, Sharan Maiya, Samuel Marks — 2025-11-20 — Cadenza Labs, FZI, University of Cambridge, Anthropic — arXiv

Summary

Creates LIARS’ BENCH, a testbed of 72,863 examples of lies and honest responses from four open-weight models across seven datasets, and evaluates three existing lie detection techniques to reveal their systematic failures on certain types of lies.

Key Result

Existing black-box and white-box lie detection techniques systematically fail to identify certain types of lies, especially in settings where it’s not possible to determine whether the model lied from the transcript alone.

Source