Spatiotemporal Blind-Spot Network with Calibrated Flow Alignment for Self-Supervised Video Denoising
- URL: http://arxiv.org/abs/2412.11820v1
- Date: Mon, 16 Dec 2024 14:37:16 GMT
- Title: Spatiotemporal Blind-Spot Network with Calibrated Flow Alignment for Self-Supervised Video Denoising
- Authors: Zikang Chen, Tao Jiang, Xiaowan Hu, Wang Zhang, Huaqiu Li, Haoqian Wang,
- Abstract summary: Self-supervised video denoising aims to remove noise from videos without relying on ground truth data.
We introduce a SpatioTemporal Blind-spot Network (STBN) for global frame feature utilization.
Our method demonstrates superior performance across both synthetic and real-world video denoising datasets.
- Score: 23.07977702905715
- License:
- Abstract: Self-supervised video denoising aims to remove noise from videos without relying on ground truth data, leveraging the video itself to recover clean frames. Existing methods often rely on simplistic feature stacking or apply optical flow without thorough analysis. This results in suboptimal utilization of both inter-frame and intra-frame information, and it also neglects the potential of optical flow alignment under self-supervised conditions, leading to biased and insufficient denoising outcomes. To this end, we first explore the practicality of optical flow in the self-supervised setting and introduce a SpatioTemporal Blind-spot Network (STBN) for global frame feature utilization. In the temporal domain, we utilize bidirectional blind-spot feature propagation through the proposed blind-spot alignment block to ensure accurate temporal alignment and effectively capture long-range dependencies. In the spatial domain, we introduce the spatial receptive field expansion module, which enhances the receptive field and improves global perception capabilities. Additionally, to reduce the sensitivity of optical flow estimation to noise, we propose an unsupervised optical flow distillation mechanism that refines fine-grained inter-frame interactions during optical flow alignment. Our method demonstrates superior performance across both synthetic and real-world video denoising datasets. The source code is publicly available at https://github.com/ZKCCZ/STBN.
Related papers
- Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement [48.76608212565327]
This paper makes endeavors in the direction of learning for low-light video enhancement without using paired ground truth.
Compared to low-light image enhancement, enhancing low-light videos is more difficult due to the intertwined effects of noise, exposure, and contrast in the spatial domain, jointly with the need for temporal coherence.
We propose the Unrolled Decomposed Unpaired Network (UDU-Net) for enhancing low-light videos by unrolling the optimization functions into a deep network to decompose the signal into spatial and temporal-related factors, which are updated iteratively.
arXiv Detail & Related papers (2024-08-22T11:45:11Z) - OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation [55.676358801492114]
We propose OCAI, a method that supports robust frame ambiguities by generating intermediate video frames alongside optical flows in between.
Our evaluations demonstrate superior quality and enhanced optical flow accuracy on established benchmarks such as Sintel and KITTI.
arXiv Detail & Related papers (2024-03-26T20:23:48Z) - STint: Self-supervised Temporal Interpolation for Geospatial Data [0.0]
Supervised and unsupervised techniques have demonstrated the potential for temporal of video data.
Most prevailing temporal techniques hinge on optical flow, which encodes the motion of pixels between video frames.
In this work, we propose an unsupervised temporal technique, which does not rely on ground truth data or require any motion information like optical flow.
arXiv Detail & Related papers (2023-08-31T18:04:50Z) - Recurrent Self-Supervised Video Denoising with Denser Receptive Field [33.3711070590966]
Self-supervised video denoising has seen decent progress through the use of blind spot networks.
Previous self-supervised video denoising methods suffer from significant information loss and texture destruction in either the whole reference frame or neighbor frames.
We propose RDRF for self-supervised video denoising, which fully exploits both the reference and neighbor frames with a denser receptive field.
arXiv Detail & Related papers (2023-08-07T14:09:08Z) - Learning Task-Oriented Flows to Mutually Guide Feature Alignment in
Synthesized and Real Video Denoising [137.5080784570804]
Video denoising aims at removing noise from videos to recover clean ones.
Some existing works show that optical flow can help the denoising by exploiting the additional spatial-temporal clues from nearby frames.
We propose a new multi-scale refined optical flow-guided video denoising method, which is more robust to different noise levels.
arXiv Detail & Related papers (2022-08-25T00:09:18Z) - Optical-Flow-Reuse-Based Bidirectional Recurrent Network for Space-Time
Video Super-Resolution [52.899234731501075]
Space-time video super-resolution (ST-VSR) simultaneously increases the spatial resolution and frame rate for a given video.
Existing methods typically suffer from difficulties in how to efficiently leverage information from a large range of neighboring frames.
We propose a coarse-to-fine bidirectional recurrent neural network instead of using ConvLSTM to leverage knowledge between adjacent frames.
arXiv Detail & Related papers (2021-10-13T15:21:30Z) - TimeLens: Event-based Video Frame Interpolation [54.28139783383213]
We introduce Time Lens, a novel indicates equal contribution method that leverages the advantages of both synthesis-based and flow-based approaches.
We show an up to 5.21 dB improvement in terms of PSNR over state-of-the-art frame-based and event-based methods.
arXiv Detail & Related papers (2021-06-14T10:33:47Z) - Unsupervised Motion Representation Enhanced Network for Action
Recognition [4.42249337449125]
Motion representation between consecutive frames has proven to have great promotion to video understanding.
TV-L1 method, an effective optical flow solver, is time-consuming and expensive in storage for caching the extracted optical flow.
We propose UF-TSN, a novel end-to-end action recognition approach enhanced with an embedded lightweight unsupervised optical flow estimator.
arXiv Detail & Related papers (2021-03-05T04:14:32Z) - Intrinsic Temporal Regularization for High-resolution Human Video
Synthesis [59.54483950973432]
temporal consistency is crucial for extending image processing pipelines to the video domain.
We propose an effective intrinsic temporal regularization scheme, where an intrinsic confidence map is estimated via the frame generator to regulate motion estimation.
We apply our intrinsic temporal regulation to single-image generator, leading to a powerful " INTERnet" capable of generating $512times512$ resolution human action videos.
arXiv Detail & Related papers (2020-12-11T05:29:45Z) - Learning Model-Blind Temporal Denoisers without Ground Truths [46.778450578529814]
Denoisers trained with synthetic data often fail to cope with the diversity of unknown noises.
Previous image-based method leads to noise overfitting if directly applied to video denoisers.
We propose a general framework for video denoising networks that successfully addresses these challenges.
arXiv Detail & Related papers (2020-07-07T07:19:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.