RunawayEvil: Jailbreaking the Image-to-Video Generative Models
- URL: http://arxiv.org/abs/2512.06674v1
- Date: Sun, 07 Dec 2025 06:14:52 GMT
- Title: RunawayEvil: Jailbreaking the Image-to-Video Generative Models
- Authors: Songping Wang, Rufan Qian, Yueming Lyu, Qinglong Liu, Linzhuang Zou, Jie Qin, Songhua Liu, Caifeng Shan,
- Abstract summary: Image-to-Video (I2V) generation synthesizes dynamic visual content from image and text inputs, providing significant creative control.<n>We propose RunawayEvil, the first multimodal jailbreak framework for I2V models with dynamic evolutionary capability.<n>We show RunawayEvil achieves state-of-the-art attack success rates on commercial I2V models, such as Open-Sora 2.0 and CogVideoX.
- Score: 59.21761412103083
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image-to-Video (I2V) generation synthesizes dynamic visual content from image and text inputs, providing significant creative control. However, the security of such multimodal systems, particularly their vulnerability to jailbreak attacks, remains critically underexplored. To bridge this gap, we propose RunawayEvil, the first multimodal jailbreak framework for I2V models with dynamic evolutionary capability. Built on a "Strategy-Tactic-Action" paradigm, our framework exhibits self-amplifying attack through three core components: (1) Strategy-Aware Command Unit that enables the attack to self-evolve its strategies through reinforcement learning-driven strategy customization and LLM-based strategy exploration; (2) Multimodal Tactical Planning Unit that generates coordinated text jailbreak instructions and image tampering guidelines based on the selected strategies; (3) Tactical Action Unit that executes and evaluates the multimodal coordinated attacks. This self-evolving architecture allows the framework to continuously adapt and intensify its attack strategies without human intervention. Extensive experiments demonstrate RunawayEvil achieves state-of-the-art attack success rates on commercial I2V models, such as Open-Sora 2.0 and CogVideoX. Specifically, RunawayEvil outperforms existing methods by 58.5 to 79 percent on COCO2017. This work provides a critical tool for vulnerability analysis of I2V models, thereby laying a foundation for more robust video generation systems.
Related papers
- Metaphor-based Jailbreaking Attacks on Text-to-Image Models [41.420325236578755]
textbfMJA is a textbfmetaphor-based textbfjailbreaking textbfattack method inspired by the Taboo game.<n>It efficiently attacks diverse defense mechanisms without prior knowledge of their type.
arXiv Detail & Related papers (2025-12-06T12:38:00Z) - Adversarial Attack-Defense Co-Evolution for LLM Safety Alignment via Tree-Group Dual-Aware Search and Optimization [51.12422886183246]
Large Language Models (LLMs) have developed rapidly in web services, delivering unprecedented capabilities while amplifying societal risks.<n>Existing works tend to focus on either isolated jailbreak attacks or static defenses, neglecting the dynamic interplay between evolving threats and safeguards in real-world web contexts.<n>We propose ACE-Safety, a novel framework that jointly optimize attack and defense models by seamlessly integrating two key innovative procedures.
arXiv Detail & Related papers (2025-11-24T15:23:41Z) - An Automated Framework for Strategy Discovery, Retrieval, and Evolution in LLM Jailbreak Attacks [9.715575204912167]
We propose a jailbreak framework that autonomously discovers, retrieves, and evolves attack strategies.<n>ASTRA achieves an average Attack Success Rate (ASR) of 82.7%, significantly outperforming baselines.
arXiv Detail & Related papers (2025-11-04T08:24:22Z) - T2V-OptJail: Discrete Prompt Optimization for Text-to-Video Jailbreak Attacks [67.91652526657599]
We formalize the T2V jailbreak attack as a discrete optimization problem and propose a joint objective-based optimization framework, called T2V-OptJail.<n>We conduct large-scale experiments on several T2V models, covering both open-source models and real commercial closed-source models.<n>The proposed method improves 11.4% and 10.0% over the existing state-of-the-art method in terms of attack success rate.
arXiv Detail & Related papers (2025-05-10T16:04:52Z) - T2VShield: Model-Agnostic Jailbreak Defense for Text-to-Video Models [88.63040835652902]
Text to video models are vulnerable to jailbreak attacks, where specially crafted prompts bypass safety mechanisms and lead to the generation of harmful or unsafe content.<n>We propose T2VShield, a comprehensive and model agnostic defense framework designed to protect text to video models from jailbreak threats.<n>Our method systematically analyzes the input, model, and output stages to identify the limitations of existing defenses.
arXiv Detail & Related papers (2025-04-22T01:18:42Z) - AutoJailbreak: Exploring Jailbreak Attacks and Defenses through a Dependency Lens [83.08119913279488]
We present a systematic analysis of the dependency relationships in jailbreak attack and defense techniques.
We propose three comprehensive, automated, and logical frameworks.
We show that the proposed ensemble jailbreak attack and defense framework significantly outperforms existing research.
arXiv Detail & Related papers (2024-06-06T07:24:41Z) - LAS-AT: Adversarial Training with Learnable Attack Strategy [82.88724890186094]
"Learnable attack strategy", dubbed LAS-AT, learns to automatically produce attack strategies to improve the model robustness.
Our framework is composed of a target network that uses AEs for training to improve robustness and a strategy network that produces attack strategies to control the AE generation.
arXiv Detail & Related papers (2022-03-13T10:21:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.