Spatio-Temporal Deformable Attention Network for Video Deblurring
- URL: http://arxiv.org/abs/2207.10852v1
- Date: Fri, 22 Jul 2022 03:03:08 GMT
- Title: Spatio-Temporal Deformable Attention Network for Video Deblurring
- Authors: Huicong Zhang, Haozhe Xie and Hongxun Yao
- Abstract summary: The key success factor of the video deblurring methods is to compensate for the blurry pixels of the mid-frame with the sharp pixels of the adjacent video frames.
We propose STDANet, which extracts the information of sharp pixels by considering the pixel-wise blur levels of the video frames.
- Score: 21.514099863308676
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The key success factor of the video deblurring methods is to compensate for
the blurry pixels of the mid-frame with the sharp pixels of the adjacent video
frames. Therefore, mainstream methods align the adjacent frames based on the
estimated optical flows and fuse the alignment frames for restoration. However,
these methods sometimes generate unsatisfactory results because they rarely
consider the blur levels of pixels, which may introduce blurry pixels from
video frames. Actually, not all the pixels in the video frames are sharp and
beneficial for deblurring. To address this problem, we propose the
spatio-temporal deformable attention network (STDANet) for video delurring,
which extracts the information of sharp pixels by considering the pixel-wise
blur levels of the video frames. Specifically, STDANet is an encoder-decoder
network combined with the motion estimator and spatio-temporal deformable
attention (STDA) module, where motion estimator predicts coarse optical flows
that are used as base offsets to find the corresponding sharp pixels in STDA
module. Experimental results indicate that the proposed STDANet performs
favorably against state-of-the-art methods on the GoPro, DVD, and BSD datasets.
Related papers
- Aggregating Long-term Sharp Features via Hybrid Transformers for Video
Deblurring [76.54162653678871]
We propose a video deblurring method that leverages both neighboring frames and present sharp frames using hybrid Transformers for feature aggregation.
Our proposed method outperforms state-of-the-art video deblurring methods as well as event-driven video deblurring methods in terms of quantitative metrics and visual quality.
arXiv Detail & Related papers (2023-09-13T16:12:11Z) - E-VFIA : Event-Based Video Frame Interpolation with Attention [8.93294761619288]
We propose an event-based video frame with attention (E-VFIA) as a lightweight kernel-based method.
E-VFIA fuses event information with standard video frames by deformable convolutions to generate high quality interpolated frames.
The proposed method represents events with high temporal resolution and uses a multi-head self-attention mechanism to better encode event-based information.
arXiv Detail & Related papers (2022-09-19T21:40:32Z) - Efficient Video Deblurring Guided by Motion Magnitude [37.25713728458234]
We propose a novel framework that utilizes the motion magnitude prior (MMP) as guidance for efficient deep video deblurring.
The MMP consists of both spatial and temporal blur level information, which can be further integrated into an efficient recurrent neural network (RNN) for video deblurring.
arXiv Detail & Related papers (2022-07-27T08:57:48Z) - Video Frame Interpolation without Temporal Priors [91.04877640089053]
Video frame aims to synthesize non-exist intermediate frames in a video sequence.
The temporal priors of videos, i.e. frames per second (FPS) and frame exposure time, may vary from different camera sensors.
We devise a novel optical flow refinement strategy for better synthesizing results.
arXiv Detail & Related papers (2021-12-02T12:13:56Z) - Recurrent Video Deblurring with Blur-Invariant Motion Estimation and
Pixel Volumes [14.384467317051831]
We propose two novel approaches to deblurring videos by effectively aggregating information from multiple video frames.
First, we present blur-invariant motion estimation learning to improve motion estimation accuracy between blurry frames.
Second, for motion compensation, instead of aligning frames by warping with estimated motions, we use a pixel volume that contains candidate sharp pixels to resolve motion estimation errors.
arXiv Detail & Related papers (2021-08-23T07:36:49Z) - ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring [92.40655035360729]
Video deblurring models exploit consecutive frames to remove blurs from camera shakes and object motions.
We propose a novel implicit method to learn spatial correspondence among blurry frames in the feature space.
Our proposed method is evaluated on the widely-adopted DVD dataset, along with a newly collected High-Frame-Rate (1000 fps) dataset for Video Deblurring.
arXiv Detail & Related papers (2021-03-07T04:33:13Z) - Motion-blurred Video Interpolation and Extrapolation [72.3254384191509]
We present a novel framework for deblurring, interpolating and extrapolating sharp frames from a motion-blurred video in an end-to-end manner.
To ensure temporal coherence across predicted frames and address potential temporal ambiguity, we propose a simple, yet effective flow-based rule.
arXiv Detail & Related papers (2021-03-04T12:18:25Z) - Personal Privacy Protection via Irrelevant Faces Tracking and Pixelation
in Video Live Streaming [61.145467627057194]
We develop a new method called Face Pixelation in Video Live Streaming to generate automatic personal privacy filtering.
For fast and accurate pixelation of irrelevant people's faces, FPVLS is organized in a frame-to-video structure of two core stages.
On the video live streaming dataset we collected, FPVLS obtains satisfying accuracy, real-time efficiency, and contains the over-pixelation problems.
arXiv Detail & Related papers (2021-01-04T16:18:26Z) - FLAVR: Flow-Agnostic Video Representations for Fast Frame Interpolation [97.99012124785177]
FLAVR is a flexible and efficient architecture that uses 3D space-time convolutions to enable end-to-end learning and inference for video framesupervised.
We demonstrate that FLAVR can serve as a useful self- pretext task for action recognition, optical flow estimation, and motion magnification.
arXiv Detail & Related papers (2020-12-15T18:59:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.