AI Safety Compendium

❯

❯

CoT May Be Highly Informative Despite "Unfaithfulness"

CoT May Be Highly Informative Despite "Unfaithfulness"

27 Apr 20261 min read

CoT May Be Highly Informative Despite “Unfaithfulness”

Amy Deng, Sydney Von Arx, Ben Snodin, Sudarsh Kunnavakkam, Tamera Lanham — 2025-08-08 — METR, Gray Swan

Source

Link: https://metr.org/blog/2025-08-08-cot-may-be-highly-informative-despite-unfaithfulness/
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- chain-of-thought-monitoring — Black-box safety (understand and control current model behaviour) / Iterative alignment

Related Pages

chain-of-thought-monitoring

Graph View

Graph view

The interactive citation graph is desktop-only. Visit this page on a larger screen to explore how concepts, agendas, papers, and organisations link together.

CoT May Be Highly Informative Despite “Unfaithfulness”
Source
Related Pages

Created with Quartz v0.1.0 © 2026

Suggest a source
Connect
Overview
About (proof of concept)
Email feedback
Made by IT for Humanity