CCS-Lib: A Python package to elicit latent knowledge from LLMs
Walter Laurito, Nora Belrose, Alex Mallen, Kay Kozaronek, Fabien Roger, Christy Koh, … (+11 more) — 2025-10-21 — Cadenza Labs — Journal of Open Source Software
Summary
A Python library implementing methods to elicit latent knowledge from large language models, focusing on extracting models’ true beliefs from their internal representations rather than relying on their textual outputs.
Source
- Link: https://joss.theoj.org/papers/10.21105/joss.06511
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- extracting-latent-knowledge — White-box safety (i.e. Interpretability)