Related papers: Zero-Shot Video Deraining with Video Diffusion Models

Zero-Shot Video Deraining with Video Diffusion Models

URL: http://arxiv.org/abs/2511.18537v1
Date: Sun, 23 Nov 2025 17:06:22 GMT
Title: Zero-Shot Video Deraining with Video Diffusion Models
Authors: Tuomas Varanka, Juan Luis Gonzalez, Hyeongwoo Kim, Pablo Garrido, Xu Yao,
Abstract summary: We introduce the first zero-shot video deraining method for complex dynamic scenes that does not require synthetic data or model fine-tuning.<n>Our approach is validated through extensive experiments on real-world rain datasets.
Score: 11.578999728002065
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Existing video deraining methods are often trained on paired datasets, either synthetic, which limits their ability to generalize to real-world rain, or captured by static cameras, which restricts their effectiveness in dynamic scenes with background and camera motion. Furthermore, recent works in fine-tuning diffusion models have shown promising results, but the fine-tuning tends to weaken the generative prior, limiting generalization to unseen cases. In this paper, we introduce the first zero-shot video deraining method for complex dynamic scenes that does not require synthetic data nor model fine-tuning, by leveraging a pretrained text-to-video diffusion model that demonstrates strong generalization capabilities. By inverting an input video into the latent space of diffusion models, its reconstruction process can be intervened and pushed away from the model's concept of rain using negative prompting. At the core of our approach is an attention switching mechanism that we found is crucial for maintaining dynamic backgrounds as well as structural consistency between the input and the derained video, mitigating artifacts introduced by naive negative prompting. Our approach is validated through extensive experiments on real-world rain datasets, demonstrating substantial improvements over prior methods and showcasing robust generalization without the need for supervised training.

Related papers

Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion [67.94300151774085]
We introduce Self Forcing, a novel training paradigm for autoregressive video diffusion models.<n>It addresses the longstanding issue of exposure bias, where models trained on ground-truth context must generate sequences conditioned on their own imperfect outputs.
arXiv Detail & Related papers (2025-06-09T17:59:55Z)
FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation [51.110607281391154]
FlowMo is a training-free guidance method for enhancing motion coherence in text-to-video models.<n>It estimates motion coherence by measuring the patch-wise variance across the temporal dimension and guides the model to reduce this variance dynamically during sampling.
arXiv Detail & Related papers (2025-06-01T19:55:33Z)
Semi-Supervised State-Space Model with Dynamic Stacking Filter for Real-World Video Deraining [73.5575992346396]
We propose a dual-branch-temporal state-space model to enhance rain streak removal in video sequences.<n>To improve multi-frame feature fusion, we derive a dynamic filter stacking, which adaptively approximates statistical filters for pixel-wise feature refinement.<n>To further explore the capacity of deraining models in supporting other vision-based tasks in rainy environments, we introduce a novel real-world benchmark.
arXiv Detail & Related papers (2025-05-22T15:50:00Z)
RainMamba: Enhanced Locality Learning with State Space Models for Video Deraining [14.025870185802463]
We present an improved SSMs-based video deraining network (RainMamba) with a novel Hilbert mechanism to better capture sequence-level local information. We also introduce a difference-guided dynamic contrastive locality learning strategy to enhance the patch-level self-similarity learning ability of the proposed network.
arXiv Detail & Related papers (2024-07-31T17:48:22Z)
Diffusion Priors for Dynamic View Synthesis from Monocular Videos [59.42406064983643]
Dynamic novel view synthesis aims to capture the temporal evolution of visual content within videos. We first finetune a pretrained RGB-D diffusion model on the video frames using a customization technique. We distill the knowledge from the finetuned model to a 4D representations encompassing both dynamic and static Neural Radiance Fields.
arXiv Detail & Related papers (2024-01-10T23:26:41Z)
Rethinking Real-world Image Deraining via An Unpaired Degradation-Conditioned Diffusion Model [51.49854435403139]
We propose RainDiff, the first real-world image deraining paradigm based on diffusion models. We introduce a stable and non-adversarial unpaired cycle-consistent architecture that can be trained, end-to-end, with only unpaired data for supervision. We also propose a degradation-conditioned diffusion model that refines the desired output via a diffusive generative process conditioned by learned priors of multiple rain degradations.
arXiv Detail & Related papers (2023-01-23T13:34:01Z)
Semi-Supervised Video Deraining with Dynamic Rain Generator [59.71640025072209]
This paper proposes a new semi-supervised video deraining method, in which a dynamic rain generator is employed to fit the rain layer. Specifically, such dynamic generator consists of one emission model and one transition model to simultaneously encode the spatially physical structure and temporally continuous changes of rain streaks. Various prior formats are designed for the labeled synthetic and unlabeled real data, so as to fully exploit the common knowledge underlying them.
arXiv Detail & Related papers (2021-03-14T14:28:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.