Watch, Listen, Understand, Mislead: Tri-modal Adversarial Attacks on Short Videos for Content Appropriateness Evaluation
- URL: http://arxiv.org/abs/2507.11968v1
- Date: Wed, 16 Jul 2025 07:02:15 GMT
- Title: Watch, Listen, Understand, Mislead: Tri-modal Adversarial Attacks on Short Videos for Content Appropriateness Evaluation
- Authors: Sahid Hossain Mustakim, S M Jishanul Islam, Ummay Maria Muna, Montasir Chowdhury, Mohammed Jawwadul Islam, Sadia Ahmmed, Tashfia Sikder, Syed Tasdid Azam Dhrubo, Swakkhar Shatabda,
- Abstract summary: This paper introduces a framework for evaluating the tri-modal safety of Multimodal Large Language Models (MLLMs)<n>We present the Short-Video Multimodal Adversarial dataset, comprising diverse short-form videos with human-guided synthetic adversarial attacks.<n> Extensive experiments on state-of-the-art MLLMs reveal significant vulnerabilities with high Attack Success Rates (ASR)
- Score: 1.0012740151280692
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multimodal Large Language Models (MLLMs) are increasingly used for content moderation, yet their robustness in short-form video contexts remains underexplored. Current safety evaluations often rely on unimodal attacks, failing to address combined attack vulnerabilities. In this paper, we introduce a comprehensive framework for evaluating the tri-modal safety of MLLMs. First, we present the Short-Video Multimodal Adversarial (SVMA) dataset, comprising diverse short-form videos with human-guided synthetic adversarial attacks. Second, we propose ChimeraBreak, a novel tri-modal attack strategy that simultaneously challenges visual, auditory, and semantic reasoning pathways. Extensive experiments on state-of-the-art MLLMs reveal significant vulnerabilities with high Attack Success Rates (ASR). Our findings uncover distinct failure modes, showing model biases toward misclassifying benign or policy-violating content. We assess results using LLM-as-a-judge, demonstrating attack reasoning efficacy. Our dataset and findings provide crucial insights for developing more robust and safe MLLMs.
Related papers
- AI vs. Human Moderators: A Comparative Evaluation of Multimodal LLMs in Content Moderation for Brand Safety [2.9165586612027234]
We benchmark the capabilities of Multimodal Large Language Models (MLLMs) in brand safety classification.<n>Through a detailed comparative analysis, we demonstrate the effectiveness of MLLMs such as Gemini, GPT, and Llama in multimodal brand safety.<n>We present an in-depth discussion shedding light on limitations of MLLMs and failure cases.
arXiv Detail & Related papers (2025-08-07T15:55:46Z) - Investigating Vulnerabilities and Defenses Against Audio-Visual Attacks: A Comprehensive Survey Emphasizing Multimodal Models [25.23931196918614]
Multimodal large language models (MLLMs) bridge the gap between audio-visual and natural language processing.<n>Despite the superior performance of MLLMs, the scarcity of high-quality audio-visual training data and computational resources necessitates the utilization of third-party data and open-source MLLMs.<n> Empirical studies demonstrate that the latest MLLMs can be manipulated to produce malicious or harmful content.
arXiv Detail & Related papers (2025-06-13T07:22:36Z) - Con Instruction: Universal Jailbreaking of Multimodal Large Language Models via Non-Textual Modalities [76.9327488986162]
Existing attacks against multimodal language models (MLLMs) primarily communicate instructions through text accompanied by adversarial images.<n>We exploit the capabilities of MLLMs to interpret non-textual instructions, specifically, adversarial images or audio generated by our novel method, Con Instruction.<n>Our method achieves the highest attack success rates, reaching 81.3% and 86.6% on LLaVA-v1.5 (13B)
arXiv Detail & Related papers (2025-05-31T13:11:14Z) - Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs [51.90597846977058]
Video-SafetyBench is the first benchmark designed to evaluate the safety of LVLMs under video-text attacks.<n>It comprises 2,264 video-text pairs spanning 48 fine-grained unsafe categories.<n>To generate semantically accurate videos for safety evaluation, we design a controllable pipeline that decomposes video semantics into subject images and motion text.
arXiv Detail & Related papers (2025-05-17T05:06:38Z) - Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation [88.78166077081912]
We introduce a multimodal unlearning benchmark, UnLOK-VQA, and an attack-and-defense framework to evaluate methods for deleting specific multimodal knowledge from MLLMs.<n>Our results show multimodal attacks outperform text- or image-only ones, and that the most effective defense removes answer information from internal model states.
arXiv Detail & Related papers (2025-05-01T01:54:00Z) - MIRAGE: Multimodal Immersive Reasoning and Guided Exploration for Red-Team Jailbreak Attacks [85.3303135160762]
MIRAGE is a novel framework that exploits narrative-driven context and role immersion to circumvent safety mechanisms in Multimodal Large Language Models.<n>It achieves state-of-the-art performance, improving attack success rates by up to 17.5% over the best baselines.<n>We demonstrate that role immersion and structured semantic reconstruction can activate inherent model biases, facilitating the model's spontaneous violation of ethical safeguards.
arXiv Detail & Related papers (2025-03-24T20:38:42Z) - Survey of Adversarial Robustness in Multimodal Large Language Models [17.926240920647892]
Multimodal Large Language Models (MLLMs) have demonstrated exceptional performance in artificial intelligence.<n>Their deployment in real-world applications raises significant concerns about adversarial vulnerabilities.<n>This paper reviews the adversarial robustness of MLLMs, covering different modalities.
arXiv Detail & Related papers (2025-03-18T06:54:59Z) - `Do as I say not as I do': A Semi-Automated Approach for Jailbreak Prompt Attack against Multimodal LLMs [33.49407213040455]
We introduce the first voice-based jailbreak attack against multimodal large language models (LLMs)<n>We propose a novel strategy, in which the disallowed prompt is flanked by benign, narrative-driven prompts.<n>We demonstrate that Flanking Attack is capable of manipulating state-of-the-art LLMs into generating misaligned and forbidden outputs.
arXiv Detail & Related papers (2025-02-02T10:05:08Z) - Image-based Multimodal Models as Intruders: Transferable Multimodal Attacks on Video-based MLLMs [48.76864299749205]
Video-based large language models (V-MLLMs) have shown vulnerability to adversarial examples in video-text multimodal tasks.<n>In this paper, we pioneer an investigation into the transferability of adversarial video samples across V-MLLMs.
arXiv Detail & Related papers (2025-01-02T03:52:22Z) - A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends [78.3201480023907]
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding and reasoning tasks.
The vulnerability of LVLMs is relatively underexplored, posing potential security risks in daily usage.
In this paper, we provide a comprehensive review of the various forms of existing LVLM attacks.
arXiv Detail & Related papers (2024-07-10T06:57:58Z) - FMM-Attack: A Flow-based Multi-modal Adversarial Attack on Video-based LLMs [57.59518049930211]
We propose the first adversarial attack tailored for video-based large language models (LLMs)
Our attack can effectively induce video-based LLMs to generate incorrect answers when videos are added with imperceptible adversarial perturbations.
Our FMM-Attack can also induce garbling in the model output, prompting video-based LLMs to hallucinate.
arXiv Detail & Related papers (2024-03-20T11:05:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.