Can a Neural Network that only Memorizes the Dataset be Undetectably Backdoored?

Matjaz Leonardis — 2025-07-10 — ODYSSEY 2025 Conference

Summary

Demonstrates that even highly interpretable neural networks that memorize datasets can be undetectably backdoored, challenging the assumption that interpretability enables backdoor detection. Analyzes a simple network with O(n+d) parameters that achieves perfect classification through memorization yet remains vulnerable to undetectable backdoors.

Key Result

Despite being interpretable and memorizing the training data with minimal parameters, the network can still be undetectably backdoored unless one has full knowledge of the dataset.

Source