I2VWM: Robust Watermarking for Image to Video Generation
- URL: http://arxiv.org/abs/2509.17773v1
- Date: Mon, 22 Sep 2025 13:37:37 GMT
- Title: I2VWM: Robust Watermarking for Image to Video Generation
- Authors: Guanjie Wang, Zehua Ma, Han Fang, Weiming Zhang,
- Abstract summary: I2VWM is a cross-modal watermarking framework designed to enhance watermark robustness across time.<n> Experiments on both open-source and commercial I2V models demonstrate that I2VWM significantly improves robustness while maintaining imperceptibility.
- Score: 41.34965301146522
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The rapid progress of image-guided video generation (I2V) has raised concerns about its potential misuse in misinformation and fraud, underscoring the urgent need for effective digital watermarking. While existing watermarking methods demonstrate robustness within a single modality, they fail to trace source images in I2V settings. To address this gap, we introduce the concept of Robust Diffusion Distance, which measures the temporal persistence of watermark signals in generated videos. Building on this, we propose I2VWM, a cross-modal watermarking framework designed to enhance watermark robustness across time. I2VWM leverages a video-simulation noise layer during training and employs an optical-flow-based alignment module during inference. Experiments on both open-source and commercial I2V models demonstrate that I2VWM significantly improves robustness while maintaining imperceptibility, establishing a new paradigm for cross-modal watermarking in the era of generative video. \href{https://github.com/MrCrims/I2VWM-Robust-Watermarking-for-Image-to-Video-Generation}{Code Released.}
Related papers
- SKeDA: A Generative Watermarking Framework for Text-to-video Diffusion Models [40.540302276054376]
We propose a generative watermarking framework tailored for text-to-video diffusion models.<n> SKeDA consists of two components: (1) Shuffle-Key-based Distribution-preserving Sampling (SKe) employs a single base pseudo-random binary sequence for watermark encryption and derives frame-level encryption sequences through permutation.<n>Extensive experiments demonstrate that SKeDA preserves high video generation quality and watermark robustness.
arXiv Detail & Related papers (2026-02-27T06:18:03Z) - WaTeRFlow: Watermark Temporal Robustness via Flow Consistency [46.206343565195375]
We present WaTeRFlow, a framework tailored for robustness under I2V.<n>It exposes the encoder-decoder to realistic distortions via instruction-driven edits and a fast video diffusion proxy during training.<n>Experiments across representative I2V models show accurate watermark recovery from frames, with higher first-frame and per-frame bit accuracy and resilience.
arXiv Detail & Related papers (2025-12-22T05:33:59Z) - T2SMark: Balancing Robustness and Diversity in Noise-as-Watermark for Diffusion Models [89.29541056113442]
T2SMark is a two-stage watermarking scheme based on Tail-Truncated Sampling (TTS)<n>We evaluate T2SMark on diffusion models with both U-Net and DiT backbones.
arXiv Detail & Related papers (2025-10-25T16:55:55Z) - D2RA: Dual Domain Regeneration Attack [14.483783077617483]
We present D2RA, a training-free, single-image attack that removes or weakens watermarks without access to the underlying model.<n>By projecting watermarked images onto natural priors across complementary representations, D2RA suppresses watermark signals while preserving visual fidelity.
arXiv Detail & Related papers (2025-10-08T20:54:22Z) - TAG-WM: Tamper-Aware Generative Image Watermarking via Diffusion Inversion Sensitivity [76.98973481600002]
This paper proposes a Tamper-Aware Generative image WaterMarking method named TAG-WM.<n>The proposed method comprises four key modules: a dual-mark joint sampling (DMJS) algorithm for embedding copyright and localization watermarks into the latent space while preserving generative quality.<n>The experimental results demonstrate that TAG-WM achieves state-of-the-art performance in both tampering robustness and localization capability even under distortion.
arXiv Detail & Related papers (2025-06-30T03:14:07Z) - Video Signature: In-generation Watermarking for Latent Video Diffusion Models [42.064769031646904]
Video Signature (VID SIG) is an in-generation watermarking method for latent video diffusion models.<n>We achieve this by partially fine-tuning the latent decoder, where Perturbation-Aware Suppression (PAS) pre-identifies and freezes perceptually sensitive layers.<n> Experimental results show that VID SIG achieves the best overall performance in watermark extraction, visual quality, and generation efficiency.
arXiv Detail & Related papers (2025-05-31T17:43:54Z) - Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking [88.89887962002207]
invisible generative watermarking remains largely underexplored in video generation.<n>We propose Safe-Sora, the first framework to embed graphical watermarks directly into the video generation process.<n>We show that Safe-Sora achieves state-of-the-art performance in terms of video quality, watermark fidelity, and robustness.
arXiv Detail & Related papers (2025-05-19T03:31:31Z) - Towards Dataset Copyright Evasion Attack against Personalized Text-to-Image Diffusion Models [52.877452505561706]
We propose the first copyright evasion attack specifically designed to undermine dataset ownership verification (DOV)<n>Our CEAT2I comprises three stages: watermarked sample detection, trigger identification, and efficient watermark mitigation.<n>Our experiments show that our CEAT2I effectively evades DOV mechanisms while preserving model performance.
arXiv Detail & Related papers (2025-05-05T17:51:55Z) - VideoShield: Regulating Diffusion-based Video Generation Models via Watermarking [27.345134138673945]
VideoShield is a novel watermarking framework for video generation models.<n>Unlike post-processing methods, VideoShield embeds watermarks directly during video generation.<n>To ensure video integrity, we introduce a tamper localization feature.
arXiv Detail & Related papers (2025-01-24T02:57:09Z) - ConsistI2V: Enhancing Visual Consistency for Image-to-Video Generation [37.05422543076405]
Image-to-video (I2V) generation aims to use the initial frame (alongside a text prompt) to create a video sequence.
Existing methods often struggle to preserve the integrity of the subject, background, and style from the first frame.
We propose ConsistI2V, a diffusion-based method to enhance visual consistency for I2V generation.
arXiv Detail & Related papers (2024-02-06T19:08:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.