AI Safety Compendium

Home

❯

summaries

❯

Detecting Strategic Deception Using Linear Probes

Detecting Strategic Deception Using Linear Probes

27 Apr 20261 min read

Detecting Strategic Deception Using Linear Probes

Nicholas Goldowsky-Dill, Bilal Chughtai, Stefan Heimersheim, Marius Hobbhahn — 2025-02-06 — Apollo Research

Source

  • Link: https://www.lesswrong.com/posts/9pGbTz6c78PGwJein/detecting-strategic-deception-using-linear-probes
  • Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
    • lie-and-deception-detectors — White-box safety (i.e. Interpretability)

Related Pages

  • lie-and-deception-detectors

Graph View

Graph view

The interactive citation graph is desktop-only. Visit this page on a larger screen to explore how concepts, agendas, papers, and organisations link together.

  • Detecting Strategic Deception Using Linear Probes
  • Source
  • Related Pages

Created with Quartz v0.1.0 © 2026

  • Suggest a source
  • Connect
  • Overview
  • About (proof of concept)
  • Email feedback
  • Made by IT for Humanity