Related papers: AI vs. Human Moderators: A Comparative Evaluation of Multimodal LLMs in Content Moderation for Brand Safety

AI vs. Human Moderators: A Comparative Evaluation of Multimodal LLMs in Content Moderation for Brand Safety

URL: http://arxiv.org/abs/2508.05527v1
Date: Thu, 07 Aug 2025 15:55:46 GMT
Title: AI vs. Human Moderators: A Comparative Evaluation of Multimodal LLMs in Content Moderation for Brand Safety
Authors: Adi Levi, Or Levi, Sardhendu Mishra, Jonathan Morra,
Abstract summary: We benchmark the capabilities of Multimodal Large Language Models (MLLMs) in brand safety classification.<n>Through a detailed comparative analysis, we demonstrate the effectiveness of MLLMs such as Gemini, GPT, and Llama in multimodal brand safety.<n>We present an in-depth discussion shedding light on limitations of MLLMs and failure cases.
Score: 2.9165586612027234
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As the volume of video content online grows exponentially, the demand for moderation of unsafe videos has surpassed human capabilities, posing both operational and mental health challenges. While recent studies demonstrated the merits of Multimodal Large Language Models (MLLMs) in various video understanding tasks, their application to multimodal content moderation, a domain that requires nuanced understanding of both visual and textual cues, remains relatively underexplored. In this work, we benchmark the capabilities of MLLMs in brand safety classification, a critical subset of content moderation for safe-guarding advertising integrity. To this end, we introduce a novel, multimodal and multilingual dataset, meticulously labeled by professional reviewers in a multitude of risk categories. Through a detailed comparative analysis, we demonstrate the effectiveness of MLLMs such as Gemini, GPT, and Llama in multimodal brand safety, and evaluate their accuracy and cost efficiency compared to professional human reviewers. Furthermore, we present an in-depth discussion shedding light on limitations of MLLMs and failure cases. We are releasing our dataset alongside this paper to facilitate future research on effective and responsible brand safety and content moderation.

Related papers

Watch, Listen, Understand, Mislead: Tri-modal Adversarial Attacks on Short Videos for Content Appropriateness Evaluation [1.0012740151280692]
This paper introduces a framework for evaluating the tri-modal safety of Multimodal Large Language Models (MLLMs)<n>We present the Short-Video Multimodal Adversarial dataset, comprising diverse short-form videos with human-guided synthetic adversarial attacks.<n> Extensive experiments on state-of-the-art MLLMs reveal significant vulnerabilities with high Attack Success Rates (ASR)
arXiv Detail & Related papers (2025-07-16T07:02:15Z)
Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation [88.78166077081912]
We introduce a multimodal unlearning benchmark, UnLOK-VQA, and an attack-and-defense framework to evaluate methods for deleting specific multimodal knowledge from MLLMs.<n>Our results show multimodal attacks outperform text- or image-only ones, and that the most effective defense removes answer information from internal model states.
arXiv Detail & Related papers (2025-05-01T01:54:00Z)
Survey of Adversarial Robustness in Multimodal Large Language Models [17.926240920647892]
Multimodal Large Language Models (MLLMs) have demonstrated exceptional performance in artificial intelligence.<n>Their deployment in real-world applications raises significant concerns about adversarial vulnerabilities.<n>This paper reviews the adversarial robustness of MLLMs, covering different modalities.
arXiv Detail & Related papers (2025-03-18T06:54:59Z)
Towards Safer Social Media Platforms: Scalable and Performant Few-Shot Harmful Content Moderation Using Large Language Models [9.42299478071576]
Harmful content on social media platforms poses significant risks to users and society.<n>Current approaches rely on human moderators, supervised classifiers, and large volumes of training data.<n>We utilize Large Language Models (LLMs) to undertake few-shot dynamic content moderation via in-context learning.
arXiv Detail & Related papers (2025-01-23T00:19:14Z)
SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models [75.67623347512368]
We propose toolns, a comprehensive framework designed for conducting safety evaluations of MLLMs. Our framework consists of a comprehensive harmful query dataset and an automated evaluation protocol. Based on our framework, we conducted large-scale experiments on 15 widely-used open-source MLLMs and 6 commercial MLLMs.
arXiv Detail & Related papers (2024-10-24T17:14:40Z)
Multimodal Situational Safety [73.63981779844916]
We present the first evaluation and analysis of a novel safety challenge termed Multimodal Situational Safety.<n>For an MLLM to respond safely, whether through language or action, it often needs to assess the safety implications of a language query within its corresponding visual context.<n>We develop the Multimodal Situational Safety benchmark (MSSBench) to assess the situational safety performance of current MLLMs.
arXiv Detail & Related papers (2024-10-08T16:16:07Z)
A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks [74.52259252807191]
Multimodal Large Language Models (MLLMs) address the complexities of real-world applications far beyond the capabilities of single-modality systems. This paper systematically sorts out the applications of MLLM in multimodal tasks such as natural language, vision, and audio.
arXiv Detail & Related papers (2024-08-02T15:14:53Z)
MultiTrust: A Comprehensive Benchmark Towards Trustworthy Multimodal Large Language Models [51.19622266249408]
MultiTrust is the first comprehensive and unified benchmark on the trustworthiness of MLLMs.<n>Our benchmark employs a rigorous evaluation strategy that addresses both multimodal risks and cross-modal impacts.<n>Extensive experiments with 21 modern MLLMs reveal some previously unexplored trustworthiness issues and risks.
arXiv Detail & Related papers (2024-06-11T08:38:13Z)
Safety of Multimodal Large Language Models on Images and Texts [33.97489213223888]
In this paper, we systematically survey current efforts on the evaluation, attack, and defense of MLLMs' safety on images and text. We review the evaluation datasets and metrics for measuring the safety of MLLMs. Next, we comprehensively present attack and defense techniques related to MLLMs' safety.
arXiv Detail & Related papers (2024-02-01T05:57:10Z)
A Survey on Detection of LLMs-Generated Content [97.87912800179531]
The ability to detect LLMs-generated content has become of paramount importance. We aim to provide a detailed overview of existing detection strategies and benchmarks. We also posit the necessity for a multi-faceted approach to defend against various attacks.
arXiv Detail & Related papers (2023-10-24T09:10:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.