Learning Spatiotemporal Frequency-Transformer for Compressed Video
Super-Resolution
- URL: http://arxiv.org/abs/2208.03012v1
- Date: Fri, 5 Aug 2022 07:02:30 GMT
- Title: Learning Spatiotemporal Frequency-Transformer for Compressed Video
Super-Resolution
- Authors: Zhongwei Qiu, Huan Yang, Jianlong Fu, Dongmei Fu
- Abstract summary: We propose a novel Frequency-Transformer for compressed video super-resolution (FTVSR)
First, we divide a video frame into patches, and transform each patch into DCT spectral maps in which each channel represents a frequency band.
Second, we study different self-attention schemes, and discover that a divided attention which conducts a joint space-frequency attention before applying temporal attention on each frequency band, leads to the best video enhancement quality.
- Score: 38.00182505384986
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compressed video super-resolution (VSR) aims to restore high-resolution
frames from compressed low-resolution counterparts. Most recent VSR approaches
often enhance an input frame by borrowing relevant textures from neighboring
video frames. Although some progress has been made, there are grand challenges
to effectively extract and transfer high-quality textures from compressed
videos where most frames are usually highly degraded. In this paper, we propose
a novel Frequency-Transformer for compressed video super-resolution (FTVSR)
that conducts self-attention over a joint space-time-frequency domain. First,
we divide a video frame into patches, and transform each patch into DCT
spectral maps in which each channel represents a frequency band. Such a design
enables a fine-grained level self-attention on each frequency band, so that
real visual texture can be distinguished from artifacts, and further utilized
for video frame restoration. Second, we study different self-attention schemes,
and discover that a divided attention which conducts a joint space-frequency
attention before applying temporal attention on each frequency band, leads to
the best video enhancement quality. Experimental results on two widely-used
video super-resolution benchmarks show that FTVSR outperforms state-of-the-art
approaches on both uncompressed and compressed videos with clear visual
margins. Code is available at https://github.com/researchmm/FTVSR.
Related papers
- Perceptual Quality Improvement in Videoconferencing using
Keyframes-based GAN [28.773037051085318]
We propose a novel GAN-based method for compression artifacts reduction in videoconferencing.
First, we extract multi-scale features from the compressed and reference frames.
Then, our architecture combines these features in a progressive manner according to facial landmarks.
arXiv Detail & Related papers (2023-11-07T16:38:23Z) - Predictive Coding For Animation-Based Video Compression [13.161311799049978]
We propose a predictive coding scheme which uses image animation as a predictor, and codes the residual with respect to the actual target frame.
Our experiments indicate a significant gain, in excess of 70% compared to the HEVC video standard and over 30% compared to VVC.
arXiv Detail & Related papers (2023-07-09T14:40:54Z) - SR+Codec: a Benchmark of Super-Resolution for Video Compression Bitrate Reduction [0.0]
We developed a benchmark to analyze Super-Resolution's capacity to upscale compressed videos.
Our dataset employed video codecs based on five widely-used compression standards.
We found that some SR models, combined with compression, allow us to reduce the video without significant loss of quality.
arXiv Detail & Related papers (2023-05-08T16:42:55Z) - Learning Spatiotemporal Frequency-Transformer for Low-Quality Video
Super-Resolution [47.5883522564362]
Video Super-Resolution (VSR) aims to restore high-resolution (HR) videos from low-resolution (LR) videos.
Existing VSR techniques usually recover HR frames by extracting textures from nearby frames with known degradation processes.
We propose a novel Frequency-Transformer (FTVSR) for handling low-quality videos that carry out self-attention in a combined space-time-frequency domain.
arXiv Detail & Related papers (2022-12-27T16:26:15Z) - VideoINR: Learning Video Implicit Neural Representation for Continuous
Space-Time Super-Resolution [75.79379734567604]
We show that Video Implicit Neural Representation (VideoINR) can be decoded to videos of arbitrary spatial resolution and frame rate.
We show that VideoINR achieves competitive performances with state-of-the-art STVSR methods on common up-sampling scales.
arXiv Detail & Related papers (2022-06-09T17:45:49Z) - Learning Trajectory-Aware Transformer for Video Super-Resolution [50.49396123016185]
Video super-resolution aims to restore a sequence of high-resolution (HR) frames from their low-resolution (LR) counterparts.
Existing approaches usually align and aggregate video frames from limited adjacent frames.
We propose a novel Transformer for Video Super-Resolution (TTVSR)
arXiv Detail & Related papers (2022-04-08T03:37:39Z) - VRT: A Video Restoration Transformer [126.79589717404863]
Video restoration (e.g., video super-resolution) aims to restore high-quality frames from low-quality frames.
We propose a Video Restoration Transformer (VRT) with parallel frame prediction and long-range temporal dependency modelling abilities.
arXiv Detail & Related papers (2022-01-28T17:54:43Z) - COMISR: Compression-Informed Video Super-Resolution [76.94152284740858]
Most videos on the web or mobile devices are compressed, and the compression can be severe when the bandwidth is limited.
We propose a new compression-informed video super-resolution model to restore high-resolution content without introducing artifacts caused by compression.
arXiv Detail & Related papers (2021-05-04T01:24:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.