Video Signature: In-generation Watermarking for Latent Video Diffusion Models
- URL: http://arxiv.org/abs/2506.00652v3
- Date: Mon, 15 Sep 2025 17:04:55 GMT
- Title: Video Signature: In-generation Watermarking for Latent Video Diffusion Models
- Authors: Yu Huang, Junhao Chen, Shuliang Liu, Hanqian Li, Qi Zheng, Yi R. Fung, Xuming Hu,
- Abstract summary: Video Signature (VID SIG) is an in-generation watermarking method for latent video diffusion models.<n>We achieve this by partially fine-tuning the latent decoder, where Perturbation-Aware Suppression (PAS) pre-identifies and freezes perceptually sensitive layers.<n> Experimental results show that VID SIG achieves the best overall performance in watermark extraction, visual quality, and generation efficiency.
- Score: 42.064769031646904
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid development of Artificial Intelligence Generated Content (AIGC) has led to significant progress in video generation but also raises serious concerns about intellectual property protection and reliable content tracing. Watermarking is a widely adopted solution to this issue, but existing methods for video generation mainly follow a post-generation paradigm, which introduces additional computational overhead and often fails to effectively balance the trade-off between video quality and watermark extraction. To address these issues, we propose Video Signature (VIDSIG), an in-generation watermarking method for latent video diffusion models, which enables implicit and adaptive watermark integration during generation. Specifically, we achieve this by partially fine-tuning the latent decoder, where Perturbation-Aware Suppression (PAS) pre-identifies and freezes perceptually sensitive layers to preserve visual quality. Beyond spatial fidelity, we further enhance temporal consistency by introducing a lightweight Temporal Alignment module that guides the decoder to generate coherent frame sequences during fine-tuning. Experimental results show that VIDSIG achieves the best overall performance in watermark extraction, visual quality, and generation efficiency. It also demonstrates strong robustness against both spatial and temporal tampering, highlighting its practicality in real-world scenarios. Our code is available at \href{https://github.com/hardenyu21/Video-Signature}{here}
Related papers
- Latent-Mark: An Audio Watermark Robust to Neural Resynthesis [62.09761127079914]
Latent-Mark is the first zero-bit audio watermarking framework designed to survive semantic compression.<n>Our key insight is that robustness to the encode-decode process requires embedding the watermark within the invariant latent space.<n>Our work inspires future research into universal watermarking frameworks capable of maintaining integrity across increasingly complex and diverse generative distortions.
arXiv Detail & Related papers (2026-03-05T15:51:09Z) - SIGMark: Scalable In-Generation Watermark with Blind Extraction for Video Diffusion [11.934813439152528]
Invisible watermarking is a key technology for protecting AI-generated videos and tracing harmful content, and plays a crucial role in AI safety.<n>Existing in-generation approaches are non-blind, requiring maintaining all the message-key pairs and performing template-based matching during extraction.<n>We propose SIGMark, a Scalable In-Generation watermarking framework with blind extraction for video diffusion.
arXiv Detail & Related papers (2026-03-03T11:33:44Z) - SKeDA: A Generative Watermarking Framework for Text-to-video Diffusion Models [40.540302276054376]
We propose a generative watermarking framework tailored for text-to-video diffusion models.<n> SKeDA consists of two components: (1) Shuffle-Key-based Distribution-preserving Sampling (SKe) employs a single base pseudo-random binary sequence for watermark encryption and derives frame-level encryption sequences through permutation.<n>Extensive experiments demonstrate that SKeDA preserves high video generation quality and watermark robustness.
arXiv Detail & Related papers (2026-02-27T06:18:03Z) - AlcheMinT: Fine-grained Temporal Control for Multi-Reference Consistent Video Generation [58.844504598618094]
We propose AlcheMinT, a unified framework that introduces explicit timestamps conditioning for subject-driven video generation.<n>Our approach introduces a novel positional encoding mechanism that unlocks the encoding of temporal intervals, associated in our case with subject identities.<n>We incorporate subject-descriptive text tokens to strengthen binding between visual identity and video captions, mitigating ambiguity during generation.
arXiv Detail & Related papers (2025-12-11T18:59:34Z) - DINVMark: A Deep Invertible Network for Video Watermarking [17.63051709541545]
This paper introduces a Deep Invertible Network for Video watermarking (DINVMark) and designs a noise layer to simulate HEVC compression.<n>Results demonstrate that the proposed scheme significantly enhances watermark robustness, preserves video quality, and substantially increases watermark embedding capacity.
arXiv Detail & Related papers (2025-09-22T07:08:20Z) - TAG-WM: Tamper-Aware Generative Image Watermarking via Diffusion Inversion Sensitivity [68.95168727940973]
Tamper-Aware Generative image WaterMarking method named TAG-WM.<n>This paper proposes a Tamper-Aware Generative image WaterMarking method named TAG-WM.
arXiv Detail & Related papers (2025-06-30T03:14:07Z) - Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking [53.434260110195446]
Safe-Sora is the first framework to embed graphical watermarks directly into the video generation process.<n>We develop a 3D wavelet transform-enhanced Mamba architecture with a adaptive localtemporal scanning strategy.<n>Experiments demonstrate Safe-Sora achieves state-of-the-art performance in terms of video quality, watermark fidelity, and robustness.
arXiv Detail & Related papers (2025-05-19T03:31:31Z) - VIDSTAMP: A Temporally-Aware Watermark for Ownership and Integrity in Video Diffusion Models [32.0365189539138]
VIDSTAMP is a watermarking framework that embeds messages directly into the latent space of temporally-aware video diffusion models.<n>Our method imposes no additional inference cost and offers better perceptual quality than prior methods.
arXiv Detail & Related papers (2025-05-02T17:35:03Z) - GenPTW: In-Generation Image Watermarking for Provenance Tracing and Tamper Localization [32.843425702098116]
GenPTW is an In-Generation image watermarking framework for latent diffusion models (LDMs)<n>It embeds structured watermark signals during the image generation phase, enabling unified provenance tracing and tamper localization.<n>Experiments demonstrate that GenPTW outperforms existing methods in image fidelity, watermark extraction accuracy, and tamper localization performance.
arXiv Detail & Related papers (2025-04-28T08:21:39Z) - Gaussian Shading++: Rethinking the Realistic Deployment Challenge of Performance-Lossless Image Watermark for Diffusion Models [66.54457339638004]
Copyright protection and inappropriate content generation pose challenges for the practical implementation of diffusion models.<n>We propose a diffusion model watermarking method tailored for real-world deployment.<n>Gaussian Shading++ not only maintains performance losslessness but also outperforms existing methods in terms of robustness.
arXiv Detail & Related papers (2025-04-21T11:18:16Z) - RepVideo: Rethinking Cross-Layer Representation for Video Generation [53.701548524818534]
We propose RepVideo, an enhanced representation framework for text-to-video diffusion models.<n>By accumulating features from neighboring layers to form enriched representations, this approach captures more stable semantic information.<n>Our experiments demonstrate that our RepVideo not only significantly enhances the ability to generate accurate spatial appearances, but also improves temporal consistency in video generation.
arXiv Detail & Related papers (2025-01-15T18:20:37Z) - Video Seal: Open and Efficient Video Watermarking [47.40833588157406]
Video watermarking addresses challenges by embedding imperceptible signals into videos, allowing for identification.<n>Video Seal is a comprehensive framework for neural video watermarking and a competitive open-sourced model.<n>We present experimental results demonstrating the effectiveness of the approach in terms of speed, imperceptibility, and robustness.
arXiv Detail & Related papers (2024-12-12T17:41:49Z) - LVMark: Robust Watermark for Latent Video Diffusion Models [13.85241328100336]
We introduce LVMark, a novel watermarking method for video diffusion models.<n>We propose a new watermark decoder tailored for generated videos by learning the consistency between adjacent frames.<n>We optimize both the watermark decoder and the latent decoder of diffusion model, effectively balancing the trade-off between visual quality and bit accuracy.
arXiv Detail & Related papers (2024-12-12T09:57:20Z) - Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World
Video Super-Resolution [65.91317390645163]
Upscale-A-Video is a text-guided latent diffusion framework for video upscaling.
It ensures temporal coherence through two key mechanisms: locally, it integrates temporal layers into U-Net and VAE-Decoder, maintaining consistency within short sequences.
It also offers greater flexibility by allowing text prompts to guide texture creation and adjustable noise levels to balance restoration and generation.
arXiv Detail & Related papers (2023-12-11T18:54:52Z) - VideoGen: A Reference-Guided Latent Diffusion Approach for High
Definition Text-to-Video Generation [73.54366331493007]
VideoGen is a text-to-video generation approach, which can generate a high-definition video with high frame fidelity and strong temporal consistency.
We leverage an off-the-shelf text-to-image generation model, e.g., Stable Diffusion, to generate an image with high content quality from the text prompt.
arXiv Detail & Related papers (2023-09-01T11:14:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.