MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

Chejian Xu, Jiawei Zhang, Zhaorun Chen, Chulin Xie, Mintong Kang, Yujin Potter, … (+19 more) — 2025-03-19 — Multiple academic institutions — ICLR 2025 (preprint on arXiv)

Summary

Presents MMDT (Multimodal DecodingTrust), a comprehensive evaluation platform and benchmark for assessing safety and trustworthiness of multimodal foundation models across multiple dimensions including safety, hallucination, fairness, privacy, adversarial robustness, and OOD generalization, with red teaming algorithms to generate challenging test cases.

Key Result

Evaluation of multiple multimodal models reveals vulnerabilities across safety perspectives, demonstrating need for comprehensive trustworthiness assessment beyond helpfulness metrics.

Source