Related papers: Video Signature: In-generation Watermarking for Latent Video Diffusion Models

Video Signature: In-generation Watermarking for Latent Video Diffusion Models

URL: http://arxiv.org/abs/2506.00652v1
Date: Sat, 31 May 2025 17:43:54 GMT
Title: Video Signature: In-generation Watermarking for Latent Video Diffusion Models
Authors: Yu Huang, Junhao Chen, Qi Zheng, Hanqian Li, Shuliang Liu, Xuming Hu,
Abstract summary: Video Signature (VID SIG) is an in-generation watermarking method for latent video diffusion models.<n>We achieve this by partially fine-tuning the latent decoder, where Perturbation-Aware Suppression (PAS) pre-identifies and freezes perceptually sensitive layers to preserve visual quality.<n> Experimental results show that VID SIG achieves the best overall performance in watermark extraction, visual quality, and generation efficiency.
Score: 19.648332041264474
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The rapid development of Artificial Intelligence Generated Content (AIGC) has led to significant progress in video generation but also raises serious concerns about intellectual property protection and reliable content tracing. Watermarking is a widely adopted solution to this issue, but existing methods for video generation mainly follow a post-generation paradigm, which introduces additional computational overhead and often fails to effectively balance the trade-off between video quality and watermark extraction. To address these issues, we propose Video Signature (VIDSIG), an in-generation watermarking method for latent video diffusion models, which enables implicit and adaptive watermark integration during generation. Specifically, we achieve this by partially fine-tuning the latent decoder, where Perturbation-Aware Suppression (PAS) pre-identifies and freezes perceptually sensitive layers to preserve visual quality. Beyond spatial fidelity, we further enhance temporal consistency by introducing a lightweight Temporal Alignment module that guides the decoder to generate coherent frame sequences during fine-tuning. Experimental results show that VIDSIG achieves the best overall performance in watermark extraction, visual quality, and generation efficiency. It also demonstrates strong robustness against both spatial and temporal tampering, highlighting its practicality in real-world scenarios.

Related papers

TAG-WM: Tamper-Aware Generative Image Watermarking via Diffusion Inversion Sensitivity [68.95168727940973]
Tamper-Aware Generative image WaterMarking method named TAG-WM.<n>This paper proposes a Tamper-Aware Generative image WaterMarking method named TAG-WM.
arXiv Detail & Related papers (2025-06-30T03:14:07Z)
Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking [53.434260110195446]
Safe-Sora is the first framework to embed graphical watermarks directly into the video generation process.<n>We develop a 3D wavelet transform-enhanced Mamba architecture with a adaptive localtemporal scanning strategy.<n>Experiments demonstrate Safe-Sora achieves state-of-the-art performance in terms of video quality, watermark fidelity, and robustness.
arXiv Detail & Related papers (2025-05-19T03:31:31Z)
VIDSTAMP: A Temporally-Aware Watermark for Ownership and Integrity in Video Diffusion Models [32.0365189539138]
VIDSTAMP is a watermarking framework that embeds messages directly into the latent space of temporally-aware video diffusion models.<n>Our method imposes no additional inference cost and offers better perceptual quality than prior methods.
arXiv Detail & Related papers (2025-05-02T17:35:03Z)
GenPTW: In-Generation Image Watermarking for Provenance Tracing and Tamper Localization [32.843425702098116]
GenPTW is an In-Generation image watermarking framework for latent diffusion models (LDMs)<n>It embeds structured watermark signals during the image generation phase, enabling unified provenance tracing and tamper localization.<n>Experiments demonstrate that GenPTW outperforms existing methods in image fidelity, watermark extraction accuracy, and tamper localization performance.
arXiv Detail & Related papers (2025-04-28T08:21:39Z)
Gaussian Shading++: Rethinking the Realistic Deployment Challenge of Performance-Lossless Image Watermark for Diffusion Models [66.54457339638004]
Copyright protection and inappropriate content generation pose challenges for the practical implementation of diffusion models.<n>We propose a diffusion model watermarking method tailored for real-world deployment.<n>Gaussian Shading++ not only maintains performance losslessness but also outperforms existing methods in terms of robustness.
arXiv Detail & Related papers (2025-04-21T11:18:16Z)
RepVideo: Rethinking Cross-Layer Representation for Video Generation [53.701548524818534]
We propose RepVideo, an enhanced representation framework for text-to-video diffusion models.<n>By accumulating features from neighboring layers to form enriched representations, this approach captures more stable semantic information.<n>Our experiments demonstrate that our RepVideo not only significantly enhances the ability to generate accurate spatial appearances, but also improves temporal consistency in video generation.
arXiv Detail & Related papers (2025-01-15T18:20:37Z)
Video Seal: Open and Efficient Video Watermarking [47.40833588157406]
Video watermarking addresses challenges by embedding imperceptible signals into videos, allowing for identification.<n>Video Seal is a comprehensive framework for neural video watermarking and a competitive open-sourced model.<n>We present experimental results demonstrating the effectiveness of the approach in terms of speed, imperceptibility, and robustness.
arXiv Detail & Related papers (2024-12-12T17:41:49Z)
LVMark: Robust Watermark for Latent Video Diffusion Models [13.85241328100336]
We introduce LVMark, a novel watermarking method for video diffusion models.<n>We propose a new watermark decoder tailored for generated videos by learning the consistency between adjacent frames.<n>We optimize both the watermark decoder and the latent decoder of diffusion model, effectively balancing the trade-off between visual quality and bit accuracy.
arXiv Detail & Related papers (2024-12-12T09:57:20Z)
Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution [65.91317390645163]
Upscale-A-Video is a text-guided latent diffusion framework for video upscaling. It ensures temporal coherence through two key mechanisms: locally, it integrates temporal layers into U-Net and VAE-Decoder, maintaining consistency within short sequences. It also offers greater flexibility by allowing text prompts to guide texture creation and adjustable noise levels to balance restoration and generation.
arXiv Detail & Related papers (2023-12-11T18:54:52Z)
VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation [73.54366331493007]
VideoGen is a text-to-video generation approach, which can generate a high-definition video with high frame fidelity and strong temporal consistency. We leverage an off-the-shelf text-to-image generation model, e.g., Stable Diffusion, to generate an image with high content quality from the text prompt.
arXiv Detail & Related papers (2023-09-01T11:14:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.