Docent: A system for analyzing and intervening on agent behavior

Kevin Meng, Vincent Huang, Jacob Steinhardt, Sarah Schwettmann — 2025-03-24 — Transluce — Transluce Blog

Summary

Introduces Docent, a system for analyzing AI agent transcripts using LLM-powered summarization, clustering, search, and counterfactual intervention capabilities to identify environment issues, unexpected behaviors, and evaluation quality problems.

Key Result

Applied to InterCode benchmark, Docent identified missing packages and environment issues; fixing these increased GPT-4o’s solve rate from 68.6% to 78%.

Source