A Practical Gated Recurrent Transformer Network Incorporating Multiple Fusions for Video Denoising
- URL: http://arxiv.org/abs/2409.06603v1
- Date: Tue, 10 Sep 2024 15:55:53 GMT
- Title: A Practical Gated Recurrent Transformer Network Incorporating Multiple Fusions for Video Denoising
- Authors: Kai Guo, Seungwon Choi, Jongseong Choi, Lae-Hoon Kim,
- Abstract summary: State-of-the-art (SOTA) video denoising methods employ multi-frame simultaneous denoising mechanisms.
We propose a multi-fusion gated recurrent Transformer network (GRTN) that achieves SOTA denoising performance with only a single-frame delay.
- Score: 1.5044085747326295
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art (SOTA) video denoising methods employ multi-frame simultaneous denoising mechanisms, resulting in significant delays (e.g., 16 frames), making them impractical for real-time cameras. To overcome this limitation, we propose a multi-fusion gated recurrent Transformer network (GRTN) that achieves SOTA denoising performance with only a single-frame delay. Specifically, the spatial denoising module extracts features from the current frame, while the reset gate selects relevant information from the previous frame and fuses it with current frame features via the temporal denoising module. The update gate then further blends this result with the previous frame features, and the reconstruction module integrates it with the current frame. To robustly compute attention for noisy features, we propose a residual simplified Swin Transformer with Euclidean distance (RSSTE) in the spatial and temporal denoising modules. Comparative objective and subjective results show that our GRTN achieves denoising performance comparable to SOTA multi-frame delay networks, with only a single-frame delay.
Related papers
- Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models [64.2445487645478]
Large Language Models have shown remarkable efficacy in generating streaming data such as text and audio.
We present Live2Diff, the first attempt at designing a video diffusion model with uni-directional temporal attention, specifically targeting live streaming video translation.
arXiv Detail & Related papers (2024-07-11T17:34:51Z) - Low Latency Video Denoising for Online Conferencing Using CNN
Architectures [4.7805617044617446]
We propose a pipeline for real-time video denoising with low runtime cost and high perceptual quality.
A custom noise detector analyzer provides real-time feedback to adapt the weights and improve the models' output.
arXiv Detail & Related papers (2023-02-17T00:55:54Z) - Gated Recurrent Unit for Video Denoising [5.515903319513226]
We propose a new video denoising model based on gated recurrent unit (GRU) mechanisms for video denoising.
The experimental results show that the GRU-VD network can achieve better quality than state of the arts objectively and subjectively.
arXiv Detail & Related papers (2022-10-17T14:34:54Z) - Real-time Streaming Video Denoising with Bidirectional Buffers [48.57108807146537]
Real-time denoising algorithms are typically adopted on the user device to remove the noise involved during the shooting and transmission of video streams.
Recent multi-output inference works propagate the bidirectional temporal feature with a parallel or recurrent framework.
We propose a Bidirectional Streaming Video Denoising framework, to achieve high-fidelity real-time denoising for streaming videos with both past and future temporal receptive fields.
arXiv Detail & Related papers (2022-07-14T14:01:03Z) - Distortion-Aware Network Pruning and Feature Reuse for Real-time Video
Segmentation [49.17930380106643]
We propose a novel framework to speed up any architecture with skip-connections for real-time vision tasks.
Specifically, at the arrival of each frame, we transform the features from the previous frame to reuse them at specific spatial bins.
We then perform partial computation of the backbone network on the regions of the current frame that captures temporal differences between the current and previous frame.
arXiv Detail & Related papers (2022-06-20T07:20:02Z) - Coarse-to-Fine Video Denoising with Dual-Stage Spatial-Channel
Transformer [29.03463312813923]
Video denoising aims to recover high-quality frames from the noisy video.
Most existing approaches adopt convolutional neural networks(CNNs) to separate the noise from the original visual content.
We propose a Dual-stage Spatial-Channel Transformer (DSCT) for coarse-to-fine video denoising.
arXiv Detail & Related papers (2022-04-30T09:01:21Z) - Unidirectional Video Denoising by Mimicking Backward Recurrent Modules
with Look-ahead Forward Ones [72.68740880786312]
Bidirectional recurrent networks (BiRNN) have exhibited appealing performance in several video restoration tasks.
BiRNN is intrinsically offline because it uses backward recurrent modules to propagate from the last to current frames.
We present a novel recurrent network consisting of forward and look-ahead recurrent modules for unidirectional video denoising.
arXiv Detail & Related papers (2022-04-12T05:33:15Z) - Multi-Stage Raw Video Denoising with Adversarial Loss and Gradient Mask [14.265454188161819]
We propose a learning-based approach for denoising raw videos captured under low lighting conditions.
We first explicitly align the neighboring frames to the current frame using a convolutional neural network (CNN)
We then fuse the registered frames using another CNN to obtain the final denoised frame.
arXiv Detail & Related papers (2021-03-04T06:57:48Z) - Intrinsic Temporal Regularization for High-resolution Human Video
Synthesis [59.54483950973432]
temporal consistency is crucial for extending image processing pipelines to the video domain.
We propose an effective intrinsic temporal regularization scheme, where an intrinsic confidence map is estimated via the frame generator to regulate motion estimation.
We apply our intrinsic temporal regulation to single-image generator, leading to a powerful " INTERnet" capable of generating $512times512$ resolution human action videos.
arXiv Detail & Related papers (2020-12-11T05:29:45Z) - All at Once: Temporally Adaptive Multi-Frame Interpolation with Advanced
Motion Modeling [52.425236515695914]
State-of-the-art methods are iterative solutions interpolating one frame at the time.
This work introduces a true multi-frame interpolator.
It utilizes a pyramidal style network in the temporal domain to complete the multi-frame task in one-shot.
arXiv Detail & Related papers (2020-07-23T02:34:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.