Pay Attention to Hidden States for Video Deblurring: Ping-Pong Recurrent
Neural Networks and Selective Non-Local Attention
- URL: http://arxiv.org/abs/2203.16063v1
- Date: Wed, 30 Mar 2022 05:21:05 GMT
- Title: Pay Attention to Hidden States for Video Deblurring: Ping-Pong Recurrent
Neural Networks and Selective Non-Local Attention
- Authors: JoonKyu Park, Seungjun Nah, Kyoung Mu Lee
- Abstract summary: We propose 2 modules to supplement the RNN architecture for video deblurring.
First, we design Ping-Pong RNN that acts on updating the hidden states by referring to the features from the current and the previous time steps alternately.
Second, we use a Selective Non-Local Attention(SNLA) module to additionally refine the hidden state by aligning it with the positional information from the input frame feature.
- Score: 58.49075799159015
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video deblurring models exploit information in the neighboring frames to
remove blur caused by the motion of the camera and the objects. Recurrent
Neural Networks~(RNNs) are often adopted to model the temporal dependency
between frames via hidden states. When motion blur is strong, however, hidden
states are hard to deliver proper information due to the displacement between
different frames. While there have been attempts to update the hidden states,
it is difficult to handle misaligned features beyond the receptive field of
simple modules. Thus, we propose 2 modules to supplement the RNN architecture
for video deblurring. First, we design Ping-Pong RNN~(PPRNN) that acts on
updating the hidden states by referring to the features from the current and
the previous time steps alternately. PPRNN gathers relevant information from
the both features in an iterative and balanced manner by utilizing its
recurrent architecture. Second, we use a Selective Non-Local Attention~(SNLA)
module to additionally refine the hidden state by aligning it with the
positional information from the input frame feature. The attention score is
scaled by the relevance to the input feature to focus on the necessary
information. By paying attention to hidden states with both modules, which have
strong synergy, our PAHS framework improves the representation powers of RNN
structures and achieves state-of-the-art deblurring performance on standard
benchmarks and real-world videos.
Related papers
- Spatio-temporal Prompting Network for Robust Video Feature Extraction [74.54597668310707]
Frametemporal is one of the main challenges in the field of video understanding.
Recent approaches exploit transformer-based integration modules to obtain quality-of-temporal information.
We present a neat and unified framework called N-Temporal Prompting Network (NNSTP)
It can efficiently extract video features by adjusting the input features in the network backbone.
arXiv Detail & Related papers (2024-02-04T17:52:04Z) - STDAN: Deformable Attention Network for Space-Time Video
Super-Resolution [39.18399652834573]
We propose a deformable attention network called STDAN for STVSR.
First, we devise a long-short term feature (LSTFI) module, which is capable of abundant content from more neighboring input frames.
Second, we put forward a spatial-temporal deformable feature aggregation (STDFA) module, in which spatial and temporal contexts are adaptively captured and aggregated.
arXiv Detail & Related papers (2022-03-14T03:40:35Z) - Recurrence-in-Recurrence Networks for Video Deblurring [58.49075799159015]
State-of-the-art video deblurring methods often adopt recurrent neural networks to model the temporal dependency between the frames.
In this paper, we propose recurrence-in-recurrence network architecture to cope with the limitations of short-ranged memory.
arXiv Detail & Related papers (2022-03-12T11:58:13Z) - Exploring Motion and Appearance Information for Temporal Sentence
Grounding [52.01687915910648]
We propose a Motion-Appearance Reasoning Network (MARN) to solve temporal sentence grounding.
We develop separate motion and appearance branches to learn motion-guided and appearance-guided object relations.
Our proposed MARN significantly outperforms previous state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2022-01-03T02:44:18Z) - Video Salient Object Detection via Contrastive Features and Attention
Modules [106.33219760012048]
We propose a network with attention modules to learn contrastive features for video salient object detection.
A co-attention formulation is utilized to combine the low-level and high-level features.
We show that the proposed method requires less computation, and performs favorably against the state-of-the-art approaches.
arXiv Detail & Related papers (2021-11-03T17:40:32Z) - Group-based Bi-Directional Recurrent Wavelet Neural Networks for Video
Super-Resolution [4.9136996406481135]
Video super-resolution (VSR) aims to estimate a high-resolution (HR) frame from a low-resolution (LR) frames.
Key challenge for VSR lies in the effective exploitation of spatial correlation in an intra-frame and temporal dependency between consecutive frames.
arXiv Detail & Related papers (2021-06-14T06:36:13Z) - Learning Comprehensive Motion Representation for Action Recognition [124.65403098534266]
2D CNN-based methods are efficient but may yield redundant features due to applying the same 2D convolution kernel to each frame.
Recent efforts attempt to capture motion information by establishing inter-frame connections while still suffering the limited temporal receptive field or high latency.
We propose a Channel-wise Motion Enhancement (CME) module to adaptively emphasize the channels related to dynamic information with a channel-wise gate vector.
We also propose a Spatial-wise Motion Enhancement (SME) module to focus on the regions with the critical target in motion, according to the point-to-point similarity between adjacent feature maps.
arXiv Detail & Related papers (2021-03-23T03:06:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.