Here’s 18 Applications of Deception Probes

Cleo Nardo, Avi Parrack, jordine — 2025-08-28 — LessWrong

Summary

Systematically enumerates 18 applications of deception probes for AI safety, analyzing required properties and desiderata for each use case, from monitoring current models to eliciting latent knowledge to augmenting scalable oversight.

Source

Link: https://lesswrong.com/posts/7zhAwcBri7yupStKy/here-s-18-applications-of-deception-probes
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- monitoring-concepts — White-box safety (i.e. Interpretability) / Concept-based interpretability

monitoring-concepts

AI Safety Compendium

Explorer

Here's 18 Applications of Deception Probes

Here’s 18 Applications of Deception Probes

Summary

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

Here's 18 Applications of Deception Probes

Here’s 18 Applications of Deception Probes

Summary

Source

Related Pages

Graph View

Graph view

Table of Contents