Related papers: VideoEraser: Concept Erasure in Text-to-Video Diffusion Models

VideoEraser: Concept Erasure in Text-to-Video Diffusion Models

URL: http://arxiv.org/abs/2508.15314v2
Date: Wed, 27 Aug 2025 08:48:35 GMT
Title: VideoEraser: Concept Erasure in Text-to-Video Diffusion Models
Authors: Naen Xu, Jinghuai Zhang, Changjiang Li, Zhi Chen, Chunyi Zhou, Qingming Li, Tianyu Du, Shouling Ji,
Abstract summary: VideoEraser is a training-free framework that prevents T2V diffusion models from generating videos with undesirable concepts.<n>VideoEraser consistently outperforms prior methods regarding efficacy, integrity, fidelity, robustness, and generalizability.
Score: 47.56791061013888
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rapid growth of text-to-video (T2V) diffusion models has raised concerns about privacy, copyright, and safety due to their potential misuse in generating harmful or misleading content. These models are often trained on numerous datasets, including unauthorized personal identities, artistic creations, and harmful materials, which can lead to uncontrolled production and distribution of such content. To address this, we propose VideoEraser, a training-free framework that prevents T2V diffusion models from generating videos with undesirable concepts, even when explicitly prompted with those concepts. Designed as a plug-and-play module, VideoEraser can seamlessly integrate with representative T2V diffusion models via a two-stage process: Selective Prompt Embedding Adjustment (SPEA) and Adversarial-Resilient Noise Guidance (ARNG). We conduct extensive evaluations across four tasks, including object erasure, artistic style erasure, celebrity erasure, and explicit content erasure. Experimental results show that VideoEraser consistently outperforms prior methods regarding efficacy, integrity, fidelity, robustness, and generalizability. Notably, VideoEraser achieves state-of-the-art performance in suppressing undesirable content during T2V generation, reducing it by 46% on average across four tasks compared to baselines.

Related papers

T2VAttack: Adversarial Attack on Text-to-Video Diffusion Models [67.13397169618624]
We introduce T2VAttack, a study of adversarial attacks on Text-to-Video (T2V) models from both semantic and temporal perspectives.<n>To achieve an effective and efficient attack process, we propose two adversarial attack methods: T2VAttack-S, which identifies semantically or temporally critical words in prompts and replaces them with synonyms via greedy search, and T2VAttack-I, which iteratively inserts optimized words with minimal perturbation to the prompt.
arXiv Detail & Related papers (2025-12-30T03:00:46Z)
VCE: Safe Autoregressive Image Generation via Visual Contrast Exploitation [57.36681904639463]
Methods to safeguard autoregressive text-to-image models remain underexplored.<n>We propose Visual Contrast Exploitation (VCE), a novel framework that precisely decouples unsafe concepts from their associated content semantics.<n>Our experiments demonstrate that our method effectively secures the model, achieving state-of-the-art results while erasing unsafe concepts and maintaining the integrity of unrelated safe concepts.
arXiv Detail & Related papers (2025-09-21T09:00:27Z)
T2VUnlearning: A Concept Erasing Method for Text-to-Video Diffusion Models [5.876360170606312]
We propose a robust and precise unlearning method for text-to-video (T2V) models.<n>To achieve precise unlearning, we incorporate a localization and a preservation regularization to preserve the model's ability to generate non-target concepts.<n>Our method effectively erases a specific concept while preserving the model's generation capability for all other concepts, outperforming existing methods.
arXiv Detail & Related papers (2025-05-23T06:56:32Z)
BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation [37.055665794706336]
Text-to-video (T2V) generative models have rapidly advanced and found widespread applications across fields like entertainment, education, and marketing.<n>We observe that in T2V generation tasks, the generated videos often contain substantial redundant information not explicitly specified in the text prompts.<n>We introduce BadVideo, the first backdoor attack framework tailored for T2V generation.
arXiv Detail & Related papers (2025-04-23T17:34:48Z)
T2VShield: Model-Agnostic Jailbreak Defense for Text-to-Video Models [88.63040835652902]
Text to video models are vulnerable to jailbreak attacks, where specially crafted prompts bypass safety mechanisms and lead to the generation of harmful or unsafe content.<n>We propose T2VShield, a comprehensive and model agnostic defense framework designed to protect text to video models from jailbreak threats.<n>Our method systematically analyzes the input, model, and output stages to identify the limitations of existing defenses.
arXiv Detail & Related papers (2025-04-22T01:18:42Z)
SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation [65.30207993362595]
Unlearning/editing-based methods for safe generation remove harmful concepts from models but face several challenges.<n>We propose SAFREE, a training-free approach for safe T2I and T2V.<n>We detect a subspace corresponding to a set of toxic concepts in the text embedding space and steer prompt embeddings away from this subspace.
arXiv Detail & Related papers (2024-10-16T17:32:23Z)
ShieldDiff: Suppressing Sexual Content Generation from Diffusion Models through Reinforcement Learning [7.099258248662009]
There is a potential risk that text-to-image (T2I) model can generate unsafe images with uncomfortable contents. In our work, we focus on eliminating the NSFW (not safe for work) content generation from T2I model. We propose a customized reward function consisting of the CLIP (Contrastive Language-Image Pre-training) and nudity rewards to prune the nudity contents.
arXiv Detail & Related papers (2024-10-04T19:37:56Z)
Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation [72.90144343056227]
We explore the visual representations produced from a pre-trained text-to-video (T2V) diffusion model for video understanding tasks. We introduce a novel framework, termed "VD-IT", tailored with dedicatedly designed components built upon a fixed T2V model. Our VD-IT achieves highly competitive results, surpassing many existing state-of-the-art methods.
arXiv Detail & Related papers (2024-03-18T17:59:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.