Recursive Fusion and Deformable Spatiotemporal Attention for Video
Compression Artifact Reduction
- URL: http://arxiv.org/abs/2108.02110v1
- Date: Wed, 4 Aug 2021 15:25:27 GMT
- Title: Recursive Fusion and Deformable Spatiotemporal Attention for Video
Compression Artifact Reduction
- Authors: Minyi Zhao, Yi Xu, Shuigeng Zhou
- Abstract summary: deep learning algorithms have been proposed to recover high-quality videos from low-quality compressed ones.
In this paper, we propose Recursive Fusion (RF) module to model the temporal dependency within a long temporal range.
We also design an efficient and effective Deformabletemporal Stemporal Attention (DSTA) module to pay more effort on restoring the artifact-rich areas.
- Score: 36.255863808004065
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: A number of deep learning based algorithms have been proposed to recover
high-quality videos from low-quality compressed ones. Among them, some restore
the missing details of each frame via exploring the spatiotemporal information
of neighboring frames. However, these methods usually suffer from a narrow
temporal scope, thus may miss some useful details from some frames outside the
neighboring ones. In this paper, to boost artifact removal, on the one hand, we
propose a Recursive Fusion (RF) module to model the temporal dependency within
a long temporal range. Specifically, RF utilizes both the current reference
frames and the preceding hidden state to conduct better spatiotemporal
compensation. On the other hand, we design an efficient and effective
Deformable Spatiotemporal Attention (DSTA) module such that the model can pay
more effort on restoring the artifact-rich areas like the boundary area of a
moving object. Extensive experiments show that our method outperforms the
existing ones on the MFQE 2.0 dataset in terms of both fidelity and perceptual
effect. Code is available at https://github.com/zhaominyiz/RFDA-PyTorch.
Related papers
- SOAP: Enhancing Spatio-Temporal Relation and Motion Information Capturing for Few-Shot Action Recognition [18.542942459854867]
Large amounts of video samples are continuously required for traditional data-driven research.
We propose a novel plug-and-play architecture for action recognition called Stemp-Oral frAme tuwenle (SOAP) in this paper.
SOAP-Net achieves new state-of-the-art performance across well-known benchmarks such as SthSthV2, Kinetics, UCF101, and SOAP51.
arXiv Detail & Related papers (2024-07-23T09:45:25Z) - Collaborative Feedback Discriminative Propagation for Video Super-Resolution [66.61201445650323]
Key success of video super-resolution (VSR) methods stems mainly from exploring spatial and temporal information.
Inaccurate alignment usually leads to aligned features with significant artifacts.
propagation modules only propagate the same timestep features forward or backward.
arXiv Detail & Related papers (2024-04-06T22:08:20Z) - STAF: 3D Human Mesh Recovery from Video with Spatio-Temporal Alignment
Fusion [35.42718669331158]
Existing models usually ignore spatial and temporal information, which might lead to mesh and image misalignment and temporal discontinuity.
As a video-based model, it leverages coherence clues from human motion by an attention-based Temporal Coherence Fusion Module.
In addition, we propose an Average Pooling Module (APM) to allow the model to focus on the entire input sequence rather than just the target frame.
arXiv Detail & Related papers (2024-01-03T13:07:14Z) - Implicit Temporal Modeling with Learnable Alignment for Video
Recognition [95.82093301212964]
We propose a novel Implicit Learnable Alignment (ILA) method, which minimizes the temporal modeling effort while achieving incredibly high performance.
ILA achieves a top-1 accuracy of 88.7% on Kinetics-400 with much fewer FLOPs compared with Swin-L and ViViT-H.
arXiv Detail & Related papers (2023-04-20T17:11:01Z) - FuTH-Net: Fusing Temporal Relations and Holistic Features for Aerial
Video Classification [49.06447472006251]
We propose a novel deep neural network, termed FuTH-Net, to model not only holistic features, but also temporal relations for aerial video classification.
Our model is evaluated on two aerial video classification datasets, ERA and Drone-Action, and achieves the state-of-the-art results.
arXiv Detail & Related papers (2022-09-22T21:15:58Z) - Exploring Long- and Short-Range Temporal Information for Learned Video
Compression [54.91301930491466]
We focus on exploiting the unique characteristics of video content and exploring temporal information to enhance compression performance.
For long-range temporal information exploitation, we propose temporal prior that can update continuously within the group of pictures (GOP) during inference.
In that case temporal prior contains valuable temporal information of all decoded images within the current GOP.
In detail, we design a hierarchical structure to achieve multi-scale compensation.
arXiv Detail & Related papers (2022-08-07T15:57:18Z) - Look Back and Forth: Video Super-Resolution with Explicit Temporal
Difference Modeling [105.69197687940505]
We propose to explore the role of explicit temporal difference modeling in both LR and HR space.
To further enhance the super-resolution result, not only spatial residual features are extracted, but the difference between consecutive frames in high-frequency domain is also computed.
arXiv Detail & Related papers (2022-04-14T17:07:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.