Related papers: OutSafe-Bench: A Benchmark for Multimodal Offensive Content Detection in Large Language Models

OutSafe-Bench: A Benchmark for Multimodal Offensive Content Detection in Large Language Models

URL: http://arxiv.org/abs/2511.10287v1
Date: Fri, 14 Nov 2025 01:43:41 GMT
Title: OutSafe-Bench: A Benchmark for Multimodal Offensive Content Detection in Large Language Models
Authors: Yuping Yan, Yuhan Xie, Yuanshuai Li, Yingchao Yu, Lingjuan Lyu, Yaochu Jin,
Abstract summary: We introduce OutSafe-Bench, the first most comprehensive content safety evaluation test suite designed for the multimodal era.<n>OutSafe-Bench includes a large-scale dataset that spans four modalities, featuring over 18,000 bilingual (Chinese and English) text prompts, 4,500 images, 450 audio clips and 450 videos, all systematically annotated across nine critical content risk categories.<n>In addition to the dataset, we introduce a Multidimensional Cross Risk Score (MCRS), a novel metric designed to model and assess overlapping and correlated content risks across different categories.
Score: 54.80460603255789
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Since Multimodal Large Language Models (MLLMs) are increasingly being integrated into everyday tools and intelligent agents, growing concerns have arisen regarding their possible output of unsafe contents, ranging from toxic language and biased imagery to privacy violations and harmful misinformation. Current safety benchmarks remain highly limited in both modality coverage and performance evaluations, often neglecting the extensive landscape of content safety. In this work, we introduce OutSafe-Bench, the first most comprehensive content safety evaluation test suite designed for the multimodal era. OutSafe-Bench includes a large-scale dataset that spans four modalities, featuring over 18,000 bilingual (Chinese and English) text prompts, 4,500 images, 450 audio clips and 450 videos, all systematically annotated across nine critical content risk categories. In addition to the dataset, we introduce a Multidimensional Cross Risk Score (MCRS), a novel metric designed to model and assess overlapping and correlated content risks across different categories. To ensure fair and robust evaluation, we propose FairScore, an explainable automated multi-reviewer weighted aggregation framework. FairScore selects top-performing models as adaptive juries, thereby mitigating biases from single-model judgments and enhancing overall evaluation reliability. Our evaluation of nine state-of-the-art MLLMs reveals persistent and substantial safety vulnerabilities, underscoring the pressing need for robust safeguards in MLLMs.

Related papers

MTMCS-Bench: Evaluating Contextual Safety of Multimodal Large Language Models in Multi-Turn Dialogues [39.24594135913578]
We introduce the Multi-Turn Multimodal Contextual Safety Benchmark (MTMCS-Bench), a benchmark of realistic images and multi-turn conversations.<n>MTMCS-Bench offers paired safe and unsafe dialogues with structured evaluation.<n>We observe persistent trade-offs between contextual safety and utility, with models tending to either miss gradual risks or over-refuse benign dialogues.
arXiv Detail & Related papers (2026-01-11T03:10:56Z)
OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack-Defense Evaluation [94.61617176929384]
OmniSafeBench-MM is a comprehensive toolbox for multi-modal jailbreak attack-defense evaluation.<n>It integrates 13 representative attack methods, 15 defense strategies, and a diverse dataset spanning 9 major risk domains and 50 fine-grained categories.<n>By unifying data, methodology, and evaluation into an open-source, reproducible platform, OmniSafeBench-MM provides a standardized foundation for future research.
arXiv Detail & Related papers (2025-12-06T22:56:29Z)
Understanding and Benchmarking the Trustworthiness in Multimodal LLMs for Video Understanding [59.50808215134678]
This study introduces Trust-videoLLMs, a first comprehensive benchmark evaluating 23 state-of-the-art videoLLMs.<n>Results reveal significant limitations in dynamic scene comprehension, cross-modal resilience and real-world risk mitigation.
arXiv Detail & Related papers (2025-06-14T04:04:54Z)
USB: A Comprehensive and Unified Safety Evaluation Benchmark for Multimodal Large Language Models [31.412080488801507]
Unified Safety Benchmarks (USB) is one of the most comprehensive evaluation benchmarks in MLLM safety.<n>Our benchmark features high-quality queries, extensive risk categories, comprehensive modal combinations, and encompasses both vulnerability and oversensitivity evaluations.
arXiv Detail & Related papers (2025-05-26T08:39:14Z)
aiXamine: Simplified LLM Safety and Security [7.933485586826888]
We present aiXamine, a comprehensive black-box evaluation platform for safety and security.<n>AiXamine integrates over 40 tests (i.e., benchmarks) organized into eight key services targeting specific dimensions of safety and security.<n>The platform aggregates the evaluation results into a single detailed report per model, providing a breakdown of model performance, test examples, and rich visualizations.
arXiv Detail & Related papers (2025-04-21T09:26:05Z)
Can't See the Forest for the Trees: Benchmarking Multimodal Safety Awareness for Multimodal LLMs [56.440345471966666]
Multimodal Large Language Models (MLLMs) have expanded the capabilities of traditional language models by enabling interaction through both text and images.<n>This paper introduces MMSafeAware, the first comprehensive multimodal safety awareness benchmark designed to evaluate MLLMs across 29 safety scenarios.<n> MMSafeAware includes both unsafe and over-safety subsets to assess models abilities to correctly identify unsafe content and avoid over-sensitivity that can hinder helpfulness.
arXiv Detail & Related papers (2025-02-16T16:12:40Z)
SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models [75.67623347512368]
We propose toolns, a comprehensive framework designed for conducting safety evaluations of MLLMs. Our framework consists of a comprehensive harmful query dataset and an automated evaluation protocol. Based on our framework, we conducted large-scale experiments on 15 widely-used open-source MLLMs and 6 commercial MLLMs.
arXiv Detail & Related papers (2024-10-24T17:14:40Z)
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models [39.97454990633856]
We present MLLMGuard, a multidimensional safety evaluation suite for MLLMs. It includes a bilingual image-text evaluation dataset, inference utilities, and a lightweight evaluator. Our evaluation results across 13 advanced models indicate that MLLMs still have a substantial journey ahead before they can be considered safe and responsible.
arXiv Detail & Related papers (2024-06-11T13:41:33Z)
MultiTrust: A Comprehensive Benchmark Towards Trustworthy Multimodal Large Language Models [51.19622266249408]
MultiTrust is the first comprehensive and unified benchmark on the trustworthiness of MLLMs.<n>Our benchmark employs a rigorous evaluation strategy that addresses both multimodal risks and cross-modal impacts.<n>Extensive experiments with 21 modern MLLMs reveal some previously unexplored trustworthiness issues and risks.
arXiv Detail & Related papers (2024-06-11T08:38:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.