Related papers: WaTeRFlow: Watermark Temporal Robustness via Flow Consistency

WaTeRFlow: Watermark Temporal Robustness via Flow Consistency

URL: http://arxiv.org/abs/2512.19048v1
Date: Mon, 22 Dec 2025 05:33:59 GMT
Title: WaTeRFlow: Watermark Temporal Robustness via Flow Consistency
Authors: Utae Jeong, Sumin In, Hyunju Ryu, Jaewan Choi, Feng Yang, Jongheon Jeong, Seungryong Kim, Sangpil Kim,
Abstract summary: We present WaTeRFlow, a framework tailored for robustness under I2V.<n>It exposes the encoder-decoder to realistic distortions via instruction-driven edits and a fast video diffusion proxy during training.<n>Experiments across representative I2V models show accurate watermark recovery from frames, with higher first-frame and per-frame bit accuracy and resilience.
Score: 46.206343565195375
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image watermarking supports authenticity and provenance, yet many schemes are still easy to bypass with various distortions and powerful generative edits. Deep learning-based watermarking has improved robustness to diffusion-based image editing, but a gap remains when a watermarked image is converted to video by image-to-video (I2V), in which per-frame watermark detection weakens. I2V has quickly advanced from short, jittery clips to multi-second, temporally coherent scenes, and it now serves not only content creation but also world-modeling and simulation workflows, making cross-modal watermark recovery crucial. We present WaTeRFlow, a framework tailored for robustness under I2V. It consists of (i) FUSE (Flow-guided Unified Synthesis Engine), which exposes the encoder-decoder to realistic distortions via instruction-driven edits and a fast video diffusion proxy during training, (ii) optical-flow warping with a Temporal Consistency Loss (TCL) that stabilizes per-frame predictions, and (iii) a semantic preservation loss that maintains the conditioning signal. Experiments across representative I2V models show accurate watermark recovery from frames, with higher first-frame and per-frame bit accuracy and resilience when various distortions are applied before or after video generation.

Related papers

SKeDA: A Generative Watermarking Framework for Text-to-video Diffusion Models [40.540302276054376]
We propose a generative watermarking framework tailored for text-to-video diffusion models.<n> SKeDA consists of two components: (1) Shuffle-Key-based Distribution-preserving Sampling (SKe) employs a single base pseudo-random binary sequence for watermark encryption and derives frame-level encryption sequences through permutation.<n>Extensive experiments demonstrate that SKeDA preserves high video generation quality and watermark robustness.
arXiv Detail & Related papers (2026-02-27T06:18:03Z)
SPDMark: Selective Parameter Displacement for Robust Video Watermarking [30.398519705830264]
This work introduces a novel framework for in-generation video watermarking called SPDMark.<n>Watermarks are embedded into the generated videos by modifying a subset of parameters in the generative model.<n> Evaluations on both text-to-video and image-to-video generation models demonstrate the ability of SPDMark to generate imperceptible watermarks.
arXiv Detail & Related papers (2025-12-12T23:35:13Z)
VTinker: Guided Flow Upsampling and Texture Mapping for High-Resolution Video Frame Interpolation [55.93266219195357]
We propose a novel Video Frame Interpolation (VFI) pipeline, VTinker, which consists of two core components: guided flow upsampling (GFU) and Texture Mapping.<n>In this study, we propose a novel VFI pipeline, VTinker, which consists of two core components: guided flow upsampling (GFU) and Texture Mapping.
arXiv Detail & Related papers (2025-11-20T07:30:16Z)
I2VWM: Robust Watermarking for Image to Video Generation [41.34965301146522]
I2VWM is a cross-modal watermarking framework designed to enhance watermark robustness across time.<n> Experiments on both open-source and commercial I2V models demonstrate that I2VWM significantly improves robustness while maintaining imperceptibility.
arXiv Detail & Related papers (2025-09-22T13:37:37Z)
VIDSTAMP: A Temporally-Aware Watermark for Ownership and Integrity in Video Diffusion Models [32.0365189539138]
VIDSTAMP is a watermarking framework that embeds messages directly into the latent space of temporally-aware video diffusion models.<n>Our method imposes no additional inference cost and offers better perceptual quality than prior methods.
arXiv Detail & Related papers (2025-05-02T17:35:03Z)
WaterFlow: Learning Fast & Robust Watermarks using Stable Diffusion [46.10882190865747]
WaterFlow is a fast and extremely robust approach for high fidelity visual watermarking based on a learned latent-dependent watermark.<n>WaterFlow demonstrates state-of-the-art performance on general robustness and is the first method capable of effectively defending against difficult combination attacks.
arXiv Detail & Related papers (2025-04-15T23:27:52Z)
Taming Rectified Flow for Inversion and Editing [57.3742655030493]
Rectified-flow-based diffusion transformers like FLUX and OpenSora have demonstrated outstanding performance in the field of image and video generation.<n>Despite their robust generative capabilities, these models often struggle with inaccuracies.<n>We propose RF-r, a training-free sampler that effectively enhances inversion precision by mitigating the errors in the inversion process of rectified flow.
arXiv Detail & Related papers (2024-11-07T14:29:02Z)
COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing [57.76170824395532]
Video editing is an emerging task, in which most current methods adopt the pre-trained text-to-image (T2I) diffusion model to edit the source video.<n>We propose COrrespondence-guided Video Editing (COVE) to achieve high-quality and consistent video editing.<n>COVE can be seamlessly integrated into the pre-trained T2I diffusion model without the need for extra training or optimization.
arXiv Detail & Related papers (2024-06-13T06:27:13Z)
RIGID: Recurrent GAN Inversion and Editing of Real Face Videos [73.97520691413006]
GAN inversion is indispensable for applying the powerful editability of GAN to real images. Existing methods invert video frames individually often leading to undesired inconsistent results over time. We propose a unified recurrent framework, named textbfRecurrent vtextbfIdeo textbfGAN textbfInversion and etextbfDiting (RIGID) Our framework learns the inherent coherence between input frames in an end-to-end manner.
arXiv Detail & Related papers (2023-08-11T12:17:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.