Group-based Bi-Directional Recurrent Wavelet Neural Networks for Video
Super-Resolution
- URL: http://arxiv.org/abs/2106.07190v1
- Date: Mon, 14 Jun 2021 06:36:13 GMT
- Title: Group-based Bi-Directional Recurrent Wavelet Neural Networks for Video
Super-Resolution
- Authors: Young-Ju Choi, Young-Woon Lee, Byung-Gyu Kim
- Abstract summary: Video super-resolution (VSR) aims to estimate a high-resolution (HR) frame from a low-resolution (LR) frames.
Key challenge for VSR lies in the effective exploitation of spatial correlation in an intra-frame and temporal dependency between consecutive frames.
- Score: 4.9136996406481135
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video super-resolution (VSR) aims to estimate a high-resolution (HR) frame
from a low-resolution (LR) frames. The key challenge for VSR lies in the
effective exploitation of spatial correlation in an intra-frame and temporal
dependency between consecutive frames. However, most of the previous methods
treat different types of the spatial features identically and extract spatial
and temporal features from the separated modules. It leads to lack of obtaining
meaningful information and enhancing the fine details. In VSR, there are three
types of temporal modeling frameworks: 2D convolutional neural networks (CNN),
3D CNN, and recurrent neural networks (RNN). Among them, the RNN-based approach
is suitable for sequential data. Thus the SR performance can be greatly
improved by using the hidden states of adjacent frames. However, at each of
time step in a recurrent structure, the RNN-based previous works utilize the
neighboring features restrictively. Since the range of accessible motion per
time step is narrow, there are still limitations to restore the missing details
for dynamic or large motion. In this paper, we propose a group-based
bi-directional recurrent wavelet neural networks (GBR-WNN) to exploit the
sequential data and spatio-temporal information effectively for VSR. The
proposed group-based bi-directional RNN (GBR) temporal modeling framework is
built on the well-structured process with the group of pictures (GOP). We
propose a temporal wavelet attention (TWA) module, in which attention is
adopted for both spatial and temporal features. Experimental results
demonstrate that the proposed method achieves superior performance compared
with state-of-the-art methods in both of quantitative and qualitative
evaluations.
Related papers
- TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - Enhancing Adaptive History Reserving by Spiking Convolutional Block
Attention Module in Recurrent Neural Networks [21.509659756334802]
Spiking neural networks (SNNs) serve as one type of efficient model to processtemporal-temporal patterns in time series.
In this paper, we develop a recurrent spiking neural network (RSNN) model embedded with an advanced spiking convolutional attention module (SCBAM) component.
It invokes the history information in spatial and temporal channels adaptively through SCBAM which brings the advantages of efficient memory calling history and redundancy elimination.
arXiv Detail & Related papers (2024-01-08T08:05:34Z) - Delayed Memory Unit: Modelling Temporal Dependency Through Delay Gate [16.4160685571157]
Recurrent Neural Networks (RNNs) are widely recognized for their proficiency in modeling temporal dependencies.
This paper proposes a novel Delayed Memory Unit (DMU) for gated RNNs.
The DMU incorporates a delay line structure along with delay gates into vanilla RNN, thereby enhancing temporal interaction and facilitating temporal credit assignment.
arXiv Detail & Related papers (2023-10-23T14:29:48Z) - STDAN: Deformable Attention Network for Space-Time Video
Super-Resolution [39.18399652834573]
We propose a deformable attention network called STDAN for STVSR.
First, we devise a long-short term feature (LSTFI) module, which is capable of abundant content from more neighboring input frames.
Second, we put forward a spatial-temporal deformable feature aggregation (STDFA) module, in which spatial and temporal contexts are adaptively captured and aggregated.
arXiv Detail & Related papers (2022-03-14T03:40:35Z) - Recurrence-in-Recurrence Networks for Video Deblurring [58.49075799159015]
State-of-the-art video deblurring methods often adopt recurrent neural networks to model the temporal dependency between the frames.
In this paper, we propose recurrence-in-recurrence network architecture to cope with the limitations of short-ranged memory.
arXiv Detail & Related papers (2022-03-12T11:58:13Z) - Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based
Motion Recognition [62.46544616232238]
Previous motion recognition methods have achieved promising performance through the tightly coupled multi-temporal representation.
We propose to decouple and recouple caused caused representation for RGB-D-based motion recognition.
arXiv Detail & Related papers (2021-12-16T18:59:47Z) - Optical-Flow-Reuse-Based Bidirectional Recurrent Network for Space-Time
Video Super-Resolution [52.899234731501075]
Space-time video super-resolution (ST-VSR) simultaneously increases the spatial resolution and frame rate for a given video.
Existing methods typically suffer from difficulties in how to efficiently leverage information from a large range of neighboring frames.
We propose a coarse-to-fine bidirectional recurrent neural network instead of using ConvLSTM to leverage knowledge between adjacent frames.
arXiv Detail & Related papers (2021-10-13T15:21:30Z) - Multi-Temporal Convolutions for Human Action Recognition in Videos [83.43682368129072]
We present a novel temporal-temporal convolution block that is capable of extracting at multiple resolutions.
The proposed blocks are lightweight and can be integrated into any 3D-CNN architecture.
arXiv Detail & Related papers (2020-11-08T10:40:26Z) - MuCAN: Multi-Correspondence Aggregation Network for Video
Super-Resolution [63.02785017714131]
Video super-resolution (VSR) aims to utilize multiple low-resolution frames to generate a high-resolution prediction for each frame.
Inter- and intra-frames are the key sources for exploiting temporal and spatial information.
We build an effective multi-correspondence aggregation network (MuCAN) for VSR.
arXiv Detail & Related papers (2020-07-23T05:41:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.