Learning Spatiotemporal Frequency-Transformer for Low-Quality Video
Super-Resolution
- URL: http://arxiv.org/abs/2212.14046v1
- Date: Tue, 27 Dec 2022 16:26:15 GMT
- Title: Learning Spatiotemporal Frequency-Transformer for Low-Quality Video
Super-Resolution
- Authors: Zhongwei Qiu, Huan Yang, Jianlong Fu, Daochang Liu, Chang Xu, Dongmei
Fu
- Abstract summary: Video Super-Resolution (VSR) aims to restore high-resolution (HR) videos from low-resolution (LR) videos.
Existing VSR techniques usually recover HR frames by extracting textures from nearby frames with known degradation processes.
We propose a novel Frequency-Transformer (FTVSR) for handling low-quality videos that carry out self-attention in a combined space-time-frequency domain.
- Score: 47.5883522564362
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video Super-Resolution (VSR) aims to restore high-resolution (HR) videos from
low-resolution (LR) videos. Existing VSR techniques usually recover HR frames
by extracting pertinent textures from nearby frames with known degradation
processes. Despite significant progress, grand challenges are remained to
effectively extract and transmit high-quality textures from high-degraded
low-quality sequences, such as blur, additive noises, and compression
artifacts. In this work, a novel Frequency-Transformer (FTVSR) is proposed for
handling low-quality videos that carry out self-attention in a combined
space-time-frequency domain. First, video frames are split into patches and
each patch is transformed into spectral maps in which each channel represents a
frequency band. It permits a fine-grained self-attention on each frequency
band, so that real visual texture can be distinguished from artifacts. Second,
a novel dual frequency attention (DFA) mechanism is proposed to capture the
global frequency relations and local frequency relations, which can handle
different complicated degradation processes in real-world scenarios. Third, we
explore different self-attention schemes for video processing in the frequency
domain and discover that a ``divided attention'' which conducts a joint
space-frequency attention before applying temporal-frequency attention, leads
to the best video enhancement quality. Extensive experiments on three
widely-used VSR datasets show that FTVSR outperforms state-of-the-art methods
on different low-quality videos with clear visual margins. Code and pre-trained
models are available at https://github.com/researchmm/FTVSR.
Related papers
- Neural Video Representation for Redundancy Reduction and Consistency Preservation [0.0]
Implicit neural representation (INR) embed various signals into neural networks.
We propose a video representation method that generates both the high-frequency and low-frequency components of the frame.
Experimental results demonstrate that our method outperforms the existing HNeRV method, achieving superior results in 96 percent of the videos.
arXiv Detail & Related papers (2024-09-27T07:30:12Z) - Delving into the Frequency: Temporally Consistent Human Motion Transfer
in the Fourier Space [34.353035276767336]
Human motion transfer refers to synthesizing photo-realistic and temporally coherent videos.
Current synthetic videos suffer from the temporal inconsistency in sequential frames that significantly degrades the video quality.
We propose a novel Frequency-based human MOtion TRansfer framework, named FreMOTR, which can effectively mitigate the spatial artifacts and the temporal inconsistency of the synthesized videos.
arXiv Detail & Related papers (2022-09-01T05:30:23Z) - Learning Spatiotemporal Frequency-Transformer for Compressed Video
Super-Resolution [38.00182505384986]
We propose a novel Frequency-Transformer for compressed video super-resolution (FTVSR)
First, we divide a video frame into patches, and transform each patch into DCT spectral maps in which each channel represents a frequency band.
Second, we study different self-attention schemes, and discover that a divided attention which conducts a joint space-frequency attention before applying temporal attention on each frequency band, leads to the best video enhancement quality.
arXiv Detail & Related papers (2022-08-05T07:02:30Z) - Towards Interpretable Video Super-Resolution via Alternating
Optimization [115.85296325037565]
We study a practical space-time video super-resolution (STVSR) problem which aims at generating a high-framerate high-resolution sharp video from a low-framerate blurry video.
We propose an interpretable STVSR framework by leveraging both model-based and learning-based methods.
arXiv Detail & Related papers (2022-07-21T21:34:05Z) - Spatial-Temporal Frequency Forgery Clue for Video Forgery Detection in
VIS and NIR Scenario [87.72258480670627]
Existing face forgery detection methods based on frequency domain find that the GAN forged images have obvious grid-like visual artifacts in the frequency spectrum compared to the real images.
This paper proposes a Cosine Transform-based Forgery Clue Augmentation Network (FCAN-DCT) to achieve a more comprehensive spatial-temporal feature representation.
arXiv Detail & Related papers (2022-07-05T09:27:53Z) - VideoINR: Learning Video Implicit Neural Representation for Continuous
Space-Time Super-Resolution [75.79379734567604]
We show that Video Implicit Neural Representation (VideoINR) can be decoded to videos of arbitrary spatial resolution and frame rate.
We show that VideoINR achieves competitive performances with state-of-the-art STVSR methods on common up-sampling scales.
arXiv Detail & Related papers (2022-06-09T17:45:49Z) - VRT: A Video Restoration Transformer [126.79589717404863]
Video restoration (e.g., video super-resolution) aims to restore high-quality frames from low-quality frames.
We propose a Video Restoration Transformer (VRT) with parallel frame prediction and long-range temporal dependency modelling abilities.
arXiv Detail & Related papers (2022-01-28T17:54:43Z) - Investigating Tradeoffs in Real-World Video Super-Resolution [90.81396836308085]
Real-world video super-resolution (VSR) models are often trained with diverse degradations to improve generalizability.
To alleviate the first tradeoff, we propose a degradation scheme that reduces up to 40% of training time without sacrificing performance.
To facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences.
arXiv Detail & Related papers (2021-11-24T18:58:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.