Multi-Stage Raw Video Denoising with Adversarial Loss and Gradient Mask
- URL: http://arxiv.org/abs/2103.02861v2
- Date: Fri, 5 Mar 2021 17:38:40 GMT
- Title: Multi-Stage Raw Video Denoising with Adversarial Loss and Gradient Mask
- Authors: Avinash Paliwal, Libing Zeng and Nima Khademi Kalantari
- Abstract summary: We propose a learning-based approach for denoising raw videos captured under low lighting conditions.
We first explicitly align the neighboring frames to the current frame using a convolutional neural network (CNN)
We then fuse the registered frames using another CNN to obtain the final denoised frame.
- Score: 14.265454188161819
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a learning-based approach for denoising raw videos
captured under low lighting conditions. We propose to do this by first
explicitly aligning the neighboring frames to the current frame using a
convolutional neural network (CNN). We then fuse the registered frames using
another CNN to obtain the final denoised frame. To avoid directly aligning the
temporally distant frames, we perform the two processes of alignment and fusion
in multiple stages. Specifically, at each stage, we perform the denoising
process on three consecutive input frames to generate the intermediate denoised
frames which are then passed as the input to the next stage. By performing the
process in multiple stages, we can effectively utilize the information of
neighboring frames without directly aligning the temporally distant frames. We
train our multi-stage system using an adversarial loss with a conditional
discriminator. Specifically, we condition the discriminator on a soft gradient
mask to prevent introducing high-frequency artifacts in smooth regions. We show
that our system is able to produce temporally coherent videos with realistic
details. Furthermore, we demonstrate through extensive experiments that our
approach outperforms state-of-the-art image and video denoising methods both
numerically and visually.
Related papers
- Optical-Flow Guided Prompt Optimization for Coherent Video Generation [51.430833518070145]
We propose a framework called MotionPrompt that guides the video generation process via optical flow.
We optimize learnable token embeddings during reverse sampling steps by using gradients from a trained discriminator applied to random frame pairs.
This approach allows our method to generate visually coherent video sequences that closely reflect natural motion dynamics, without compromising the fidelity of the generated content.
arXiv Detail & Related papers (2024-11-23T12:26:52Z) - ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler [53.98558445900626]
Current image-to-video diffusion models, while powerful in generating videos from a single frame, need adaptation for two-frame conditioned generation.
We introduce a novel, bidirectional sampling strategy to address these off-manifold issues without requiring extensive re-noising or fine-tuning.
Our method employs sequential sampling along both forward and backward paths, conditioned on the start and end frames, respectively, ensuring more coherent and on-manifold generation of intermediate frames.
arXiv Detail & Related papers (2024-10-08T03:01:54Z) - Temporal As a Plugin: Unsupervised Video Denoising with Pre-Trained Image Denoisers [30.965705043127144]
In this paper, we propose a novel unsupervised video denoising framework, named Temporal As aTAP' (TAP)
By incorporating temporal modules, our method can harness temporal information across noisy frames, complementing its power of spatial denoising.
Compared to other unsupervised video denoising methods, our framework demonstrates superior performance on both sRGB and raw video denoising datasets.
arXiv Detail & Related papers (2024-09-17T15:05:33Z) - TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models [94.24861019513462]
TRIP is a new recipe of image-to-video diffusion paradigm.
It pivots on image noise prior derived from static image to jointly trigger inter-frame relational reasoning.
Extensive experiments on WebVid-10M, DTDB and MSR-VTT datasets demonstrate TRIP's effectiveness.
arXiv Detail & Related papers (2024-03-25T17:59:40Z) - Efficient Flow-Guided Multi-frame De-fencing [7.504789972841539]
De-fencing is the algorithmic process of automatically removing such obstructions from images.
We develop a framework for multi-frame de-fencing that computes high quality flow maps directly from obstructed frames.
arXiv Detail & Related papers (2023-01-25T18:42:59Z) - Efficient Video Segmentation Models with Per-frame Inference [117.97423110566963]
We focus on improving the temporal consistency without introducing overhead in inference.
We propose several techniques to learn from the video sequence, including a temporal consistency loss and online/offline knowledge distillation methods.
arXiv Detail & Related papers (2022-02-24T23:51:36Z) - TimeLens: Event-based Video Frame Interpolation [54.28139783383213]
We introduce Time Lens, a novel indicates equal contribution method that leverages the advantages of both synthesis-based and flow-based approaches.
We show an up to 5.21 dB improvement in terms of PSNR over state-of-the-art frame-based and event-based methods.
arXiv Detail & Related papers (2021-06-14T10:33:47Z) - Motion-blurred Video Interpolation and Extrapolation [72.3254384191509]
We present a novel framework for deblurring, interpolating and extrapolating sharp frames from a motion-blurred video in an end-to-end manner.
To ensure temporal coherence across predicted frames and address potential temporal ambiguity, we propose a simple, yet effective flow-based rule.
arXiv Detail & Related papers (2021-03-04T12:18:25Z) - ALANET: Adaptive Latent Attention Network forJoint Video Deblurring and
Interpolation [38.52446103418748]
We introduce a novel architecture, Adaptive Latent Attention Network (ALANET), which synthesizes sharp high frame-rate videos.
We employ combination of self-attention and cross-attention module between consecutive frames in the latent space to generate optimized representation for each frame.
Our method performs favorably against various state-of-the-art approaches, even though we tackle a much more difficult problem.
arXiv Detail & Related papers (2020-08-31T21:11:53Z) - Self-Supervised training for blind multi-frame video denoising [15.078027648304115]
We propose a self-supervised approach for training multi-frame video denoising networks.
Our approach benefits from the video temporal consistency by penalizing a loss between the predicted frame t and a neighboring target frame.
arXiv Detail & Related papers (2020-04-15T09:08:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.