MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models
Chejian Xu, Jiawei Zhang, Zhaorun Chen, Chulin Xie, Mintong Kang, Yujin Potter, … (+19 more) — 2025-03-19 — Multiple academic institutions — ICLR 2025 (preprint on arXiv)
Summary
Presents MMDT (Multimodal DecodingTrust), a comprehensive evaluation platform and benchmark for assessing safety and trustworthiness of multimodal foundation models across multiple dimensions including safety, hallucination, fairness, privacy, adversarial robustness, and OOD generalization, with red teaming algorithms to generate challenging test cases.
Key Result
Evaluation of multiple multimodal models reveals vulnerabilities across safety perspectives, demonstrating need for comprehensive trustworthiness assessment beyond helpfulness metrics.
Source
- Link: https://arxiv.org/abs/2503.14827
- Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- various-redteams — Evals