Blur Interpolation Transformer for Real-World Motion from Blur
- URL: http://arxiv.org/abs/2211.11423v1
- Date: Mon, 21 Nov 2022 13:10:10 GMT
- Title: Blur Interpolation Transformer for Real-World Motion from Blur
- Authors: Zhihang Zhong, Mingdeng Cao, Xiang Ji, Yinqiang Zheng, Imari Sato
- Abstract summary: We propose a encoded blur transformer (BiT) to unravel the underlying temporal correlation in blur.
Based on multi-scale residual Swin transformer blocks, we introduce dual-end temporal supervision and temporally symmetric ensembling strategies.
In addition, we design a hybrid camera system to collect the first real-world dataset of one-to-many blur-sharp video pairs.
- Score: 52.10523711510876
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper studies the challenging problem of recovering motion from blur,
also known as joint deblurring and interpolation or blur temporal
super-resolution. The remaining challenges are twofold: 1) the current methods
still leave considerable room for improvement in terms of visual quality even
on the synthetic dataset, and 2) poor generalization to real-world data. To
this end, we propose a blur interpolation transformer (BiT) to effectively
unravel the underlying temporal correlation encoded in blur. Based on
multi-scale residual Swin transformer blocks, we introduce dual-end temporal
supervision and temporally symmetric ensembling strategies to generate
effective features for time-varying motion rendering. In addition, we design a
hybrid camera system to collect the first real-world dataset of one-to-many
blur-sharp video pairs. Experimental results show that BiT has a significant
gain over the state-of-the-art methods on the public dataset Adobe240. Besides,
the proposed real-world dataset effectively helps the model generalize well to
real blurry scenarios.
Related papers
- WTCL-Dehaze: Rethinking Real-world Image Dehazing via Wavelet Transform and Contrastive Learning [17.129068060454255]
Single image dehazing is essential for applications such as autonomous driving and surveillance.
We propose an enhanced semi-supervised dehazing network that integrates Contrastive Loss and Discrete Wavelet Transform.
Our proposed algorithm achieves superior performance and improved robustness compared to state-of-the-art single image dehazing methods.
arXiv Detail & Related papers (2024-10-07T05:36:11Z) - DeblurDiNAT: A Generalizable Transformer for Perceptual Image Deblurring [1.5124439914522694]
DeblurDiNAT is a generalizable and efficient encoder-decoder Transformer which restores clean images visually close to the ground truth.
We present a linear feed-forward network and a non-linear dual-stage feature fusion module for faster feature propagation across the network.
arXiv Detail & Related papers (2024-03-19T21:31:31Z) - Neuromorphic Synergy for Video Binarization [54.195375576583864]
Bimodal objects serve as a visual form to embed information that can be easily recognized by vision systems.
Neuromorphic cameras offer new capabilities for alleviating motion blur, but it is non-trivial to first de-blur and then binarize the images in a real-time manner.
We propose an event-based binary reconstruction method that leverages the prior knowledge of the bimodal target's properties to perform inference independently in both event space and image space.
We also develop an efficient integration method to propagate this binary image to high frame rate binary video.
arXiv Detail & Related papers (2024-02-20T01:43:51Z) - STint: Self-supervised Temporal Interpolation for Geospatial Data [0.0]
Supervised and unsupervised techniques have demonstrated the potential for temporal of video data.
Most prevailing temporal techniques hinge on optical flow, which encodes the motion of pixels between video frames.
In this work, we propose an unsupervised temporal technique, which does not rely on ground truth data or require any motion information like optical flow.
arXiv Detail & Related papers (2023-08-31T18:04:50Z) - Generalizing Event-Based Motion Deblurring in Real-World Scenarios [62.995994797897424]
Event-based motion deblurring has shown promising results by exploiting low-latency events.
We propose a scale-aware network that allows flexible input spatial scales and enables learning from different temporal scales of motion blur.
A two-stage self-supervised learning scheme is then developed to fit real-world data distribution.
arXiv Detail & Related papers (2023-08-11T04:27:29Z) - Joint Video Multi-Frame Interpolation and Deblurring under Unknown
Exposure Time [101.91824315554682]
In this work, we aim ambitiously for a more realistic and challenging task - joint video multi-frame and deblurring under unknown exposure time.
We first adopt a variant of supervised contrastive learning to construct an exposure-aware representation from input blurred frames.
We then build our video reconstruction network upon the exposure and motion representation by progressive exposure-adaptive convolution and motion refinement.
arXiv Detail & Related papers (2023-03-27T09:43:42Z) - Rethinking Blur Synthesis for Deep Real-World Image Deblurring [4.00114307523959]
We propose a novel realistic blur synthesis pipeline to simulate the camera imaging process.
We develop an effective deblurring model that captures non-local dependencies and local context in the feature domain simultaneously.
A comprehensive experiment on three real-world datasets shows that the proposed deblurring model performs better than state-of-the-art methods.
arXiv Detail & Related papers (2022-09-28T06:50:16Z) - Time Lens++: Event-based Frame Interpolation with Parametric Non-linear
Flow and Multi-scale Fusion [47.57998625129672]
We introduce multi-scale feature-level fusion and computing one-shot non-linear inter-frame motion from events and images.
We show that our method improves the reconstruction quality by up to 0.2 dB in terms of PSNR and up to 15% in LPIPS score.
arXiv Detail & Related papers (2022-03-31T17:14:58Z) - Space-time Mixing Attention for Video Transformer [55.50839896863275]
We propose a Video Transformer model the complexity of which scales linearly with the number of frames in the video sequence.
We demonstrate that our model produces very high recognition accuracy on the most popular video recognition datasets.
arXiv Detail & Related papers (2021-06-10T17:59:14Z) - Decoupled Spatial-Temporal Transformer for Video Inpainting [77.8621673355983]
Video aims to fill the given holes with realistic appearance but is still a challenging task even with prosperous deep learning approaches.
Recent works introduce the promising Transformer architecture into deep video inpainting and achieve better performance.
We propose a Decoupled Spatial-Temporal Transformer (DSTT) for improving video inpainting with exceptional efficiency.
arXiv Detail & Related papers (2021-04-14T05:47:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.