Liars’ Bench: Evaluating Lie Detectors for Language Models

Kieron Kretschmar, Walter Laurito, Sharan Maiya, Samuel Marks — 2025-11-20 — Cadenza Labs, Anthropic, FZI, University of Cambridge — arXiv

Summary

Introduces Liars’ Bench, a benchmark of 72,863 examples of lies and honest responses across seven datasets that capture qualitatively different types of lies by LLMs, and evaluates three lie detection methods finding systematic failures on certain lie types.

Key Result

LLM-as-a-Judge achieved highest balanced accuracy (0.73), but all tested methods failed to achieve above-chance performance across all datasets, with near-zero recall on knowledge-report and gender-secret datasets.

Source