MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

Chejian Xu, Jiawei Zhang, Zhaorun Chen, Chulin Xie, Mintong Kang, Yujin Potter, … (+19 more) — 2025-03-19 — Multiple academic institutions — ICLR 2025 (preprint on arXiv)

Summary

Presents MMDT (Multimodal DecodingTrust), a comprehensive evaluation platform and benchmark for assessing safety and trustworthiness of multimodal foundation models across multiple dimensions including safety, hallucination, fairness, privacy, adversarial robustness, and OOD generalization, with red teaming algorithms to generate challenging test cases.

Key Result

Evaluation of multiple multimodal models reveals vulnerabilities across safety perspectives, demonstrating need for comprehensive trustworthiness assessment beyond helpfulness metrics.

Source

Link: https://arxiv.org/abs/2503.14827
Listed in the Shallow Review of Technical AI Safety 2025 under 1 agenda(s):
- various-redteams — Evals

various-redteams

AI Safety Compendium

Explorer

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

Summary

Key Result

Source

Graph View

Graph view

Table of Contents

AI Safety Compendium

Explorer

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

Summary

Key Result

Source

Related Pages

Graph View

Graph view

Table of Contents