Here’s 18 Applications of Deception Probes
Cleo Nardo, Avi Parrack, jordine — 2025-08-28 — LessWrong
Summary
Systematically enumerates 18 applications of deception probes for AI safety, analyzing required properties and desiderata for each use case, from monitoring current models to eliciting latent knowledge to augmenting scalable oversight.
Source
- Link: https://lesswrong.com/posts/7zhAwcBri7yupStKy/here-s-18-applications-of-deception-probes
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- monitoring-concepts — White-box safety (i.e. Interpretability) / Concept-based interpretability