SKeDA: A Generative Watermarking Framework for Text-to-video Diffusion Models
- URL: http://arxiv.org/abs/2603.00194v1
- Date: Fri, 27 Feb 2026 06:18:03 GMT
- Title: SKeDA: A Generative Watermarking Framework for Text-to-video Diffusion Models
- Authors: Yang Yang, Xinze Zou, Zehua Ma, Han Fang, Weiming Zhang,
- Abstract summary: We propose a generative watermarking framework tailored for text-to-video diffusion models.<n> SKeDA consists of two components: (1) Shuffle-Key-based Distribution-preserving Sampling (SKe) employs a single base pseudo-random binary sequence for watermark encryption and derives frame-level encryption sequences through permutation.<n>Extensive experiments demonstrate that SKeDA preserves high video generation quality and watermark robustness.
- Score: 40.540302276054376
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rise of text-to-video generation models has raised growing concerns over content authenticity, copyright protection, and malicious misuse. Watermarking serves as an effective mechanism for regulating such AI-generated content, where high fidelity and strong robustness are particularly critical. Recent generative image watermarking methods provide a promising foundation by leveraging watermark information and pseudo-random keys to control the initial sampling noise, enabling lossless embedding. However, directly extending these techniques to videos introduces two key limitations: Existing designs implicitly rely on strict alignment between video frames and frame-dependent pseudo-random binary sequences used for watermark encryption. Once this alignment is disrupted, subsequent watermark extraction becomes unreliable; and Video-specific distortions, such as inter-frame compression, significantly degrade watermark reliability. To address these issues, we propose SKeDA, a generative watermarking framework tailored for text-to-video diffusion models. SKeDA consists of two components: (1) Shuffle-Key-based Distribution-preserving Sampling (SKe) employs a single base pseudo-random binary sequence for watermark encryption and derives frame-level encryption sequences through permutation. This design transforms watermark extraction from synchronization-sensitive sequence decoding into permutation-tolerant set-level aggregation, substantially improving robustness against frame reordering and loss; and (2) Differential Attention (DA), which computes inter-frame differences and dynamically adjusts attention weights during extraction, enhancing robustness against temporal distortions. Extensive experiments demonstrate that SKeDA preserves high video generation quality and watermark robustness.
Related papers
- Latent-Mark: An Audio Watermark Robust to Neural Resynthesis [62.09761127079914]
Latent-Mark is the first zero-bit audio watermarking framework designed to survive semantic compression.<n>Our key insight is that robustness to the encode-decode process requires embedding the watermark within the invariant latent space.<n>Our work inspires future research into universal watermarking frameworks capable of maintaining integrity across increasingly complex and diverse generative distortions.
arXiv Detail & Related papers (2026-03-05T15:51:09Z) - WaTeRFlow: Watermark Temporal Robustness via Flow Consistency [46.206343565195375]
We present WaTeRFlow, a framework tailored for robustness under I2V.<n>It exposes the encoder-decoder to realistic distortions via instruction-driven edits and a fast video diffusion proxy during training.<n>Experiments across representative I2V models show accurate watermark recovery from frames, with higher first-frame and per-frame bit accuracy and resilience.
arXiv Detail & Related papers (2025-12-22T05:33:59Z) - SPDMark: Selective Parameter Displacement for Robust Video Watermarking [30.398519705830264]
This work introduces a novel framework for in-generation video watermarking called SPDMark.<n>Watermarks are embedded into the generated videos by modifying a subset of parameters in the generative model.<n> Evaluations on both text-to-video and image-to-video generation models demonstrate the ability of SPDMark to generate imperceptible watermarks.
arXiv Detail & Related papers (2025-12-12T23:35:13Z) - T2SMark: Balancing Robustness and Diversity in Noise-as-Watermark for Diffusion Models [89.29541056113442]
T2SMark is a two-stage watermarking scheme based on Tail-Truncated Sampling (TTS)<n>We evaluate T2SMark on diffusion models with both U-Net and DiT backbones.
arXiv Detail & Related papers (2025-10-25T16:55:55Z) - TAG-WM: Tamper-Aware Generative Image Watermarking via Diffusion Inversion Sensitivity [76.98973481600002]
This paper proposes a Tamper-Aware Generative image WaterMarking method named TAG-WM.<n>The proposed method comprises four key modules: a dual-mark joint sampling (DMJS) algorithm for embedding copyright and localization watermarks into the latent space while preserving generative quality.<n>The experimental results demonstrate that TAG-WM achieves state-of-the-art performance in both tampering robustness and localization capability even under distortion.
arXiv Detail & Related papers (2025-06-30T03:14:07Z) - Video Signature: In-generation Watermarking for Latent Video Diffusion Models [42.064769031646904]
Video Signature (VID SIG) is an in-generation watermarking method for latent video diffusion models.<n>We achieve this by partially fine-tuning the latent decoder, where Perturbation-Aware Suppression (PAS) pre-identifies and freezes perceptually sensitive layers.<n> Experimental results show that VID SIG achieves the best overall performance in watermark extraction, visual quality, and generation efficiency.
arXiv Detail & Related papers (2025-05-31T17:43:54Z) - Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking [88.89887962002207]
invisible generative watermarking remains largely underexplored in video generation.<n>We propose Safe-Sora, the first framework to embed graphical watermarks directly into the video generation process.<n>We show that Safe-Sora achieves state-of-the-art performance in terms of video quality, watermark fidelity, and robustness.
arXiv Detail & Related papers (2025-05-19T03:31:31Z) - VideoMark: A Distortion-Free Robust Watermarking Framework for Video Diffusion Models [18.427936201177122]
VideoMark is a distortion-free robust watermarking framework for video diffusion models.<n>We employ a frame-wise watermarking strategy with pseudorandom error correction (PRC) codes, using a fixed watermark sequence.<n>For watermark extraction, we propose a Temporal Matching Module (TMM) that leverages edit distance to align decoded messages with the original watermark sequence.
arXiv Detail & Related papers (2025-04-23T02:21:12Z) - SuperMark: Robust and Training-free Image Watermarking via Diffusion-based Super-Resolution [27.345134138673945]
We propose SuperMark, a robust, training-free watermarking framework.<n>SuperMark embeds the watermark into initial Gaussian noise using existing techniques.<n>It then applies pre-trained Super-Resolution models to denoise the watermarked noise, producing the final watermarked image.<n>For extraction, the process is reversed: the watermarked image is inverted back to the initial watermarked noise via DDIM Inversion, from which the embedded watermark is extracted.<n>Experiments demonstrate that SuperMark achieves fidelity comparable to existing methods while significantly improving robustness.
arXiv Detail & Related papers (2024-12-13T11:20:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.