AI Safety Compendium

Home

❯

summaries

❯

Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models

Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models

27 Apr 20261 min read

Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models

Cameron Tice, Philipp Alexander Kreer, Nathan Helm-Burger, Prithviraj Singh Shahani, Fedor Ryzhenkov, Jacob Haimes, … (+2 more) — 2024-12-02

Source

  • Link: https://arxiv.org/pdf/2412.01784
  • Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
    • sandbagging-evals — Evals

Related Pages

  • sandbagging-evals

Graph View

Graph view

The interactive citation graph is desktop-only. Visit this page on a larger screen to explore how concepts, agendas, papers, and organisations link together.

  • Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models
  • Source
  • Related Pages

Created with Quartz v0.1.0 © 2026

  • Suggest a source
  • Connect
  • Overview
  • About (proof of concept)
  • Email feedback
  • Made by IT for Humanity