Aggregating Nearest Sharp Features via Hybrid Transformers for Video Deblurring
- URL: http://arxiv.org/abs/2309.07054v2
- Date: Fri, 29 Nov 2024 15:59:20 GMT
- Title: Aggregating Nearest Sharp Features via Hybrid Transformers for Video Deblurring
- Authors: Wei Shang, Dongwei Ren, Yi Yang, Wangmeng Zuo,
- Abstract summary: We propose a video deblurring method that leverages both neighboring frames and existing sharp frames using hybrid Transformers for feature aggregation.
To aggregate nearest sharp features from detected sharp frames, we utilize a global Transformer with multi-scale matching capability.
Our proposed method outperforms state-of-the-art video deblurring methods as well as event-driven video deblurring methods in terms of quantitative metrics and visual quality.
- Score: 70.06559269075352
- License:
- Abstract: Video deblurring methods, aiming at recovering consecutive sharp frames from a given blurry video, usually assume that the input video suffers from consecutively blurry frames. However, in real-world scenarios captured by modern imaging devices, sharp frames often interspersed within the video, providing temporally nearest sharp features that can aid in the restoration of blurry frames. In this work, we propose a video deblurring method that leverages both neighboring frames and existing sharp frames using hybrid Transformers for feature aggregation. Specifically, we first train a blur-aware detector to distinguish between sharp and blurry frames. Then, a window-based local Transformer is employed for exploiting features from neighboring frames, where cross attention is beneficial for aggregating features from neighboring frames without explicit spatial alignment. To aggregate nearest sharp features from detected sharp frames, we utilize a global Transformer with multi-scale matching capability. Moreover, our method can easily be extended to event-driven video deblurring by incorporating an event fusion module into the global Transformer. Extensive experiments on benchmark datasets demonstrate that our proposed method outperforms state-of-the-art video deblurring methods as well as event-driven video deblurring methods in terms of quantitative metrics and visual quality. The source code and trained models are available at https://github.com/shangwei5/STGTN.
Related papers
- ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler [53.98558445900626]
Current image-to-video diffusion models, while powerful in generating videos from a single frame, need adaptation for two-frame conditioned generation.
We introduce a novel, bidirectional sampling strategy to address these off-manifold issues without requiring extensive re-noising or fine-tuning.
Our method employs sequential sampling along both forward and backward paths, conditioned on the start and end frames, respectively, ensuring more coherent and on-manifold generation of intermediate frames.
arXiv Detail & Related papers (2024-10-08T03:01:54Z) - CMTA: Cross-Modal Temporal Alignment for Event-guided Video Deblurring [44.30048301161034]
Video deblurring aims to enhance the quality of restored results in motion-red videos by gathering information from adjacent video frames.
We propose two modules: 1) Intra-frame feature enhancement operates within the exposure time of a single blurred frame, and 2) Inter-frame temporal feature alignment gathers valuable long-range temporal information to target frames.
We demonstrate that our proposed methods outperform state-of-the-art frame-based and event-based motion deblurring methods through extensive experiments conducted on both synthetic and real-world deblurring datasets.
arXiv Detail & Related papers (2024-08-27T10:09:17Z) - Burstormer: Burst Image Restoration and Enhancement Transformer [117.56199661345993]
On a shutter press, modern handheld cameras capture multiple images in rapid succession and merge them to generate a single image.
The challenge is to properly align the successive image shots and merge their complimentary information to achieve high-quality outputs.
We propose Burstormer: a novel transformer-based architecture for burst image restoration and enhancement.
arXiv Detail & Related papers (2023-04-03T17:58:44Z) - SATVSR: Scenario Adaptive Transformer for Cross Scenarios Video
Super-Resolution [0.0]
Video Super-Resolution aims to recover sequences of high-resolution (HR) frames from low-resolution (LR) frames.
Previous methods mainly utilize temporally adjacent frames to assist the reconstruction of target frames.
We devise a novel adaptive scenario video super-resolution method. Specifically, we use optical flow to label the patches in each video frame, only calculate the attention of patches with the same label. Then select the most relevant label among them to supplement the spatial-temporal information of the target frame.
arXiv Detail & Related papers (2022-11-16T06:30:13Z) - E-VFIA : Event-Based Video Frame Interpolation with Attention [8.93294761619288]
We propose an event-based video frame with attention (E-VFIA) as a lightweight kernel-based method.
E-VFIA fuses event information with standard video frames by deformable convolutions to generate high quality interpolated frames.
The proposed method represents events with high temporal resolution and uses a multi-head self-attention mechanism to better encode event-based information.
arXiv Detail & Related papers (2022-09-19T21:40:32Z) - TTVFI: Learning Trajectory-Aware Transformer for Video Frame
Interpolation [50.49396123016185]
Video frame (VFI) aims to synthesize an intermediate frame between two consecutive frames.
We propose a novel Trajectory-aware Transformer for Video Frame Interpolation (TTVFI)
Our method outperforms other state-of-the-art methods in four widely-used VFI benchmarks.
arXiv Detail & Related papers (2022-07-19T03:37:49Z) - ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring [92.40655035360729]
Video deblurring models exploit consecutive frames to remove blurs from camera shakes and object motions.
We propose a novel implicit method to learn spatial correspondence among blurry frames in the feature space.
Our proposed method is evaluated on the widely-adopted DVD dataset, along with a newly collected High-Frame-Rate (1000 fps) dataset for Video Deblurring.
arXiv Detail & Related papers (2021-03-07T04:33:13Z) - Motion-blurred Video Interpolation and Extrapolation [72.3254384191509]
We present a novel framework for deblurring, interpolating and extrapolating sharp frames from a motion-blurred video in an end-to-end manner.
To ensure temporal coherence across predicted frames and address potential temporal ambiguity, we propose a simple, yet effective flow-based rule.
arXiv Detail & Related papers (2021-03-04T12:18:25Z) - ALANET: Adaptive Latent Attention Network forJoint Video Deblurring and
Interpolation [38.52446103418748]
We introduce a novel architecture, Adaptive Latent Attention Network (ALANET), which synthesizes sharp high frame-rate videos.
We employ combination of self-attention and cross-attention module between consecutive frames in the latent space to generate optimized representation for each frame.
Our method performs favorably against various state-of-the-art approaches, even though we tackle a much more difficult problem.
arXiv Detail & Related papers (2020-08-31T21:11:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.