Related papers: Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs

Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs

URL: http://arxiv.org/abs/2505.11842v1
Date: Sat, 17 May 2025 05:06:38 GMT
Title: Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs
Authors: Xuannan Liu, Zekun Li, Zheqi He, Peipei Li, Shuhan Xia, Xing Cui, Huaibo Huang, Xi Yang, Ran He,
Abstract summary: Video-SafetyBench is the first benchmark designed to evaluate the safety of LVLMs under video-text attacks.<n>It comprises 2,264 video-text pairs spanning 48 fine-grained unsafe categories.<n>To generate semantically accurate videos for safety evaluation, we design a controllable pipeline that decomposes video semantics into subject images and motion text.
Score: 51.90597846977058
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The increasing deployment of Large Vision-Language Models (LVLMs) raises safety concerns under potential malicious inputs. However, existing multimodal safety evaluations primarily focus on model vulnerabilities exposed by static image inputs, ignoring the temporal dynamics of video that may induce distinct safety risks. To bridge this gap, we introduce Video-SafetyBench, the first comprehensive benchmark designed to evaluate the safety of LVLMs under video-text attacks. It comprises 2,264 video-text pairs spanning 48 fine-grained unsafe categories, each pairing a synthesized video with either a harmful query, which contains explicit malice, or a benign query, which appears harmless but triggers harmful behavior when interpreted alongside the video. To generate semantically accurate videos for safety evaluation, we design a controllable pipeline that decomposes video semantics into subject images (what is shown) and motion text (how it moves), which jointly guide the synthesis of query-relevant videos. To effectively evaluate uncertain or borderline harmful outputs, we propose RJScore, a novel LLM-based metric that incorporates the confidence of judge models and human-aligned decision threshold calibration. Extensive experiments show that benign-query video composition achieves average attack success rates of 67.2%, revealing consistent vulnerabilities to video-induced attacks. We believe Video-SafetyBench will catalyze future research into video-based safety evaluation and defense strategies.

Related papers

PRISM: Programmatic Reasoning with Image Sequence Manipulation for LVLM Jailbreaking [3.718606661938873]
We propose a novel and effective jailbreak framework inspired by Return-Oriented Programming (ROP) techniques from software security.<n>Our approach decomposes a harmful instruction into a sequence of individually benign visual gadgets.<n>Our findings reveal a critical and underexplored vulnerability that exploits the compositional reasoning abilities of LVLMs.
arXiv Detail & Related papers (2025-07-29T07:13:56Z)
Watch, Listen, Understand, Mislead: Tri-modal Adversarial Attacks on Short Videos for Content Appropriateness Evaluation [1.0012740151280692]
This paper introduces a framework for evaluating the tri-modal safety of Multimodal Large Language Models (MLLMs)<n>We present the Short-Video Multimodal Adversarial dataset, comprising diverse short-form videos with human-guided synthetic adversarial attacks.<n> Extensive experiments on state-of-the-art MLLMs reveal significant vulnerabilities with high Attack Success Rates (ASR)
arXiv Detail & Related papers (2025-07-16T07:02:15Z)
HoliSafe: Holistic Safety Benchmarking and Modeling with Safety Meta Token for Vision-Language Model [52.72318433518926]
Existing safety-tuning datasets and benchmarks only partially consider how image-text interactions can yield harmful content.<n>We introduce a holistic safety dataset and benchmark, HoliSafe, that spans all five safe/unsafe image-text combinations.<n>We propose SafeLLaVA, a novel VLM augmented with a learnable safety meta token and a dedicated safety head.
arXiv Detail & Related papers (2025-06-05T07:26:34Z)
From Evaluation to Defense: Advancing Safety in Video Large Language Models [33.10355085086974]
We introduce textbfVideoSafetyBench (VSB-77k) - the first large-scale, culturally diverse benchmark for Video LLM safety.<n> integrating video modality degrades safety performance by an average of 42.3%, exposing systemic risks in multimodal attack exploitation.<n>We propose textbfVideoSafety-R1, a dual-stage framework achieving unprecedented safety gains through two innovations.
arXiv Detail & Related papers (2025-05-22T13:16:53Z)
SafeVid: Toward Safety Aligned Video Large Multimodal Models [60.14535756294228]
We introduce SafeVid, a framework designed to instill video-specific safety principles in Video Large Multimodal Models (VLMMs)<n>SafeVid employs detailed textual video descriptions as an interpretive bridge, facilitating rule-driven safety reasoning.<n> Alignment with SafeVid-350K significantly enhances VLMM safety, with models like LLaVA-NeXT-Video demonstrating substantial improvements.
arXiv Detail & Related papers (2025-05-17T09:21:33Z)
Jailbreaking the Text-to-Video Generative Models [95.43898677860565]
We propose the textitfirst optimization-based jailbreak attack against text-to-video models, which is specifically designed.<n>Our approach formulates the prompt generation task as an optimization problem with three key objectives.<n>We conduct extensive experiments across multiple text-to-video models, including Open-Sora, Pika, Luma, and Kling.
arXiv Detail & Related papers (2025-05-10T16:04:52Z)
T2VShield: Model-Agnostic Jailbreak Defense for Text-to-Video Models [88.63040835652902]
Text to video models are vulnerable to jailbreak attacks, where specially crafted prompts bypass safety mechanisms and lead to the generation of harmful or unsafe content.<n>We propose T2VShield, a comprehensive and model agnostic defense framework designed to protect text to video models from jailbreak threats.<n>Our method systematically analyzes the input, model, and output stages to identify the limitations of existing defenses.
arXiv Detail & Related papers (2025-04-22T01:18:42Z)
MLLM-as-a-Judge for Image Safety without Human Labeling [81.24707039432292]
In the age of AI-generated content (AIGC), many image generation models are capable of producing harmful content.<n>It is crucial to identify such unsafe images based on established safety rules.<n>Existing approaches typically fine-tune MLLMs with human-labeled datasets.
arXiv Detail & Related papers (2024-12-31T00:06:04Z)
SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations [10.451619858527897]
We propose SafeWatch, an efficient MLLM-based video guardrail model to follow customized safety policies.<n>Unlike traditional MLLM-based guardrails that encode all safety policies autoregressively, SafeWatch uniquely encodes each policy chunk in parallel.<n>In addition, SafeWatch incorporates a policy-aware visual token pruning algorithm that adaptively selects the most relevant video tokens for each policy.
arXiv Detail & Related papers (2024-12-09T18:59:04Z)
T2VSafetyBench: Evaluating the Safety of Text-to-Video Generative Models [39.15695612766001]
We introduce T2VSafetyBench, a new benchmark for safety-critical assessments of text-to-video models. We define 12 critical aspects of video generation safety and construct a malicious prompt dataset. No single model excels in all aspects, with different models showing various strengths. There is a trade-off between the usability and safety of text-to-video generative models.
arXiv Detail & Related papers (2024-07-08T14:04:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.