The Elicitation Game: Evaluating Capability Elicitation Techniques

Felix Hofstätter, Teun van der Weij, Jayden Teoh, Rada Djoneva, Henning Bartsch, Francis Rhys Ward — 2025-02-04 — arXiv

Summary

Introduces a circuit-breaking method for creating model organisms with hidden capabilities and systematically evaluates different elicitation techniques (prompting, activation steering, fine-tuning) to determine which can most effectively reveal latent capabilities.

Key Result

Fine-tuning most reliably elicits hidden capabilities across tasks, while prompting works for MCQA but steering largely fails; combining techniques improves elicitation effectiveness.

Source