Gradient Routing: Masking Gradients to Localize Computation in Neural Networks

Alex Cloud, Jacob Goldman-Wetzler, Evžen Wybitul, Joseph Miller, Alexander Matt Turner — 2024-10-06 — arXiv

Summary

Introduces gradient routing, a training method that uses data-dependent gradient masks to isolate capabilities to specific neural network subregions, enabling interpretable representations, robust unlearning via ablation, and scalable oversight of RL agents.

Key Result

Gradient routing successfully localizes capabilities even when applied to limited, ad-hoc subsets of data, enabling robust unlearning and partitioned representations across multiple applications.

Source