A DTCWT-SVD Based Video Watermarking resistant to frame rate conversion
- URL: http://arxiv.org/abs/2206.01094v1
- Date: Thu, 2 Jun 2022 15:20:52 GMT
- Title: A DTCWT-SVD Based Video Watermarking resistant to frame rate conversion
- Authors: Yifei Wang, Qichao Ying, Zhenxing Qian, Sheng Li and Xinpeng Zhang
- Abstract summary: We present a new video watermarking based on joint Dual-Tree Cosine Wavelet Transformation (DTCWT) and Singular Value Decomposition (SVD)
We perform group-level watermarking that includes moderate temporal redundancy to resist temporal desynchronization attacks.
- Score: 27.591506014201546
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Videos can be easily tampered, copied and redistributed by attackers for
illegal and monetary usage. Such behaviors severely jeopardize the interest of
content owners. Despite huge efforts made in digital video watermarking for
copyright protection, typical distortions in video transmission including
signal attacks, geometric attacks and temporal synchronization attacks can
still easily erase the embedded signal. Among them, temporal synchronization
attacks which include frame dropping, frame insertion and frame rate conversion
is one of the most prevalent attacks. To address this issue, we present a new
video watermarking based on joint Dual-Tree Cosine Wavelet Transformation
(DTCWT) and Singular Value Decomposition (SVD), which is resistant to frame
rate conversion. We first extract a set of candidate coefficient by applying
SVD decomposition after DTCWT transform. Then, we simulate the watermark
embedding by adjusting the shape of candidate coefficient. Finally, we perform
group-level watermarking that includes moderate temporal redundancy to resist
temporal desynchronization attacks. Extensive experimental results show that
the proposed scheme is more resilient to temporal desynchronization attacks and
performs better than the existing blind video watermarking schemes.
Related papers
- Latent-Mark: An Audio Watermark Robust to Neural Resynthesis [62.09761127079914]
Latent-Mark is the first zero-bit audio watermarking framework designed to survive semantic compression.<n>Our key insight is that robustness to the encode-decode process requires embedding the watermark within the invariant latent space.<n>Our work inspires future research into universal watermarking frameworks capable of maintaining integrity across increasingly complex and diverse generative distortions.
arXiv Detail & Related papers (2026-03-05T15:51:09Z) - SKeDA: A Generative Watermarking Framework for Text-to-video Diffusion Models [40.540302276054376]
We propose a generative watermarking framework tailored for text-to-video diffusion models.<n> SKeDA consists of two components: (1) Shuffle-Key-based Distribution-preserving Sampling (SKe) employs a single base pseudo-random binary sequence for watermark encryption and derives frame-level encryption sequences through permutation.<n>Extensive experiments demonstrate that SKeDA preserves high video generation quality and watermark robustness.
arXiv Detail & Related papers (2026-02-27T06:18:03Z) - Consistency-Preserving Diverse Video Generation [5.784739104479214]
We propose a joint-sampling framework for flow-matching video generators.<n>Our approach applies diversity-driven updates and then removes only the components that would decrease a temporal-consistency objective.<n>Experiments on a state-of-the-art text-to-video flow-matching model show diversity comparable to strong joint-sampling baselines.
arXiv Detail & Related papers (2026-02-17T01:12:20Z) - WaTeRFlow: Watermark Temporal Robustness via Flow Consistency [46.206343565195375]
We present WaTeRFlow, a framework tailored for robustness under I2V.<n>It exposes the encoder-decoder to realistic distortions via instruction-driven edits and a fast video diffusion proxy during training.<n>Experiments across representative I2V models show accurate watermark recovery from frames, with higher first-frame and per-frame bit accuracy and resilience.
arXiv Detail & Related papers (2025-12-22T05:33:59Z) - SPDMark: Selective Parameter Displacement for Robust Video Watermarking [30.398519705830264]
This work introduces a novel framework for in-generation video watermarking called SPDMark.<n>Watermarks are embedded into the generated videos by modifying a subset of parameters in the generative model.<n> Evaluations on both text-to-video and image-to-video generation models demonstrate the ability of SPDMark to generate imperceptible watermarks.
arXiv Detail & Related papers (2025-12-12T23:35:13Z) - Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking [88.89887962002207]
invisible generative watermarking remains largely underexplored in video generation.<n>We propose Safe-Sora, the first framework to embed graphical watermarks directly into the video generation process.<n>We show that Safe-Sora achieves state-of-the-art performance in terms of video quality, watermark fidelity, and robustness.
arXiv Detail & Related papers (2025-05-19T03:31:31Z) - VIDSTAMP: A Temporally-Aware Watermark for Ownership and Integrity in Video Diffusion Models [32.0365189539138]
VIDSTAMP is a watermarking framework that embeds messages directly into the latent space of temporally-aware video diffusion models.<n>Our method imposes no additional inference cost and offers better perceptual quality than prior methods.
arXiv Detail & Related papers (2025-05-02T17:35:03Z) - VideoMark: A Distortion-Free Robust Watermarking Framework for Video Diffusion Models [18.043141353517317]
VideoMark is a training-free robust watermarking framework for video diffusion models.
Our method generates an extended watermark message sequence and randomly selects starting positions for each video.
Our watermark remains undetectable to attackers without the secret key, ensuring strong imperceptibility compared to other watermarking frameworks.
arXiv Detail & Related papers (2025-04-23T02:21:12Z) - Rethinking Video Tokenization: A Conditioned Diffusion-based Approach [58.164354605550194]
New tokenizer, Diffusion Conditioned-based Gene Tokenizer, replaces GAN-based decoder with conditional diffusion model.
We trained using only a basic MSE diffusion loss for reconstruction, along with KL term and LPIPS perceptual loss from scratch.
Even a scaled-down version of CDT (3$times inference speedup) still performs comparably with top baselines.
arXiv Detail & Related papers (2025-03-05T17:59:19Z) - Counteracting temporal attacks in Video Copy Detection [1.0742675209112622]
The META AI Challenge on video copy detection provided a benchmark for evaluating state-of-the-art methods.
Our analysis reveals significant limitations in the VED component, particularly in its ability to handle exact copies.
We propose an improved frame selection strategy based on local maxima of interframe differences.
arXiv Detail & Related papers (2025-01-19T21:16:39Z) - LVMark: Robust Watermark for Latent Video Diffusion Models [13.85241328100336]
We introduce LVMark, a novel watermarking method for video diffusion models.
We propose a new watermark decoder tailored for generated videos by learning the consistency between adjacent frames.
We optimize both the watermark decoder and the latent decoder of diffusion model, effectively balancing the trade-off between visual quality and bit accuracy.
arXiv Detail & Related papers (2024-12-12T09:57:20Z) - ReToMe-VA: Recursive Token Merging for Video Diffusion-based Unrestricted Adversarial Attack [71.2286719703198]
We propose the Recursive Token Merging for Video Diffusion-based Unrestricted Adrial Attack (ReToMe-VA)
To achieve spatial imperceptibility, ReToMe-VA adopts a Timestep-wise Adrial Latent Optimization (TALO) strategy.
To achieve temporal imperceptibility, ReToMe-VA introduces a Recursive Token Merging (ReToMe) mechanism by matching and merging tokens across video frames.
arXiv Detail & Related papers (2024-08-10T08:10:30Z) - SSyncOA: Self-synchronizing Object-aligned Watermarking to Resist Cropping-paste Attacks [14.886729577388822]
cropping-paste attack breaks the synchronization of the image watermark.
Key to resisting the cropping-paste attack lies in robust features of the object to protect.
We propose a self-synchronizing object-aligned watermarking method, called SSyncOA.
arXiv Detail & Related papers (2024-05-06T13:29:34Z) - RIGID: Recurrent GAN Inversion and Editing of Real Face Videos [73.97520691413006]
GAN inversion is indispensable for applying the powerful editability of GAN to real images.
Existing methods invert video frames individually often leading to undesired inconsistent results over time.
We propose a unified recurrent framework, named textbfRecurrent vtextbfIdeo textbfGAN textbfInversion and etextbfDiting (RIGID)
Our framework learns the inherent coherence between input frames in an end-to-end manner.
arXiv Detail & Related papers (2023-08-11T12:17:24Z) - Predictive Coding For Animation-Based Video Compression [13.161311799049978]
We propose a predictive coding scheme which uses image animation as a predictor, and codes the residual with respect to the actual target frame.
Our experiments indicate a significant gain, in excess of 70% compared to the HEVC video standard and over 30% compared to VVC.
arXiv Detail & Related papers (2023-07-09T14:40:54Z) - Inter-frame Accelerate Attack against Video Interpolation Models [73.28751441626754]
We apply adversarial attacks to VIF models and find that the VIF models are very vulnerable to adversarial examples.
We propose a novel attack method named Inter-frame Accelerate Attack (IAA) thats the iterations as the perturbation for the previous adjacent frame.
It is shown that our method can improve attack efficiency greatly while achieving comparable attack performance with traditional methods.
arXiv Detail & Related papers (2023-05-11T03:08:48Z) - Transform-Equivariant Consistency Learning for Temporal Sentence
Grounding [66.10949751429781]
We introduce a novel Equivariant Consistency Regulation Learning framework to learn more discriminative representations for each video.
Our motivation comes from that the temporal boundary of the query-guided activity should be consistently predicted.
In particular, we devise a self-supervised consistency loss module to enhance the completeness and smoothness of the augmented video.
arXiv Detail & Related papers (2023-05-06T19:29:28Z) - Towards Smooth Video Composition [59.134911550142455]
Video generation requires consistent and persistent frames with dynamic content over time.
This work investigates modeling the temporal relations for composing video with arbitrary length, from a few frames to even infinite, using generative adversarial networks (GANs)
We show that the alias-free operation for single image generation, together with adequately pre-learned knowledge, brings a smooth frame transition without compromising the per-frame quality.
arXiv Detail & Related papers (2022-12-14T18:54:13Z) - Robust Watermarking for Video Forgery Detection with Improved
Imperceptibility and Robustness [30.611167333725408]
This paper proposes a video watermarking network for tampering localization.
We jointly train a 3D-UNet-based watermark embedding network and a decoder that predicts the tampering mask.
Experimental results demonstrate that our method generates watermarked videos with good imperceptibility and robustly and accurately locates tampered areas.
arXiv Detail & Related papers (2022-07-07T16:27:10Z) - Adversarial Attacks on Deep Learning-based Video Compression and
Classification Systems [23.305818640220554]
We conduct the first systematic study for adversarial attacks on deep learning based video compression and downstream classification systems.
We propose an adaptive adversarial attack that can manipulate the Rate-Distortion relationship of a video compression model to achieve two adversarial goals.
We also devise novel objectives for targeted and untargeted attacks to a downstream video classification service.
arXiv Detail & Related papers (2022-03-18T22:42:20Z) - Intrinsic Temporal Regularization for High-resolution Human Video
Synthesis [59.54483950973432]
temporal consistency is crucial for extending image processing pipelines to the video domain.
We propose an effective intrinsic temporal regularization scheme, where an intrinsic confidence map is estimated via the frame generator to regulate motion estimation.
We apply our intrinsic temporal regulation to single-image generator, leading to a powerful " INTERnet" capable of generating $512times512$ resolution human action videos.
arXiv Detail & Related papers (2020-12-11T05:29:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.