Learning Spatiotemporal Frequency-Transformer for Low-Quality Video
Super-Resolution
- URL: http://arxiv.org/abs/2212.14046v1
- Date: Tue, 27 Dec 2022 16:26:15 GMT
- Title: Learning Spatiotemporal Frequency-Transformer for Low-Quality Video
Super-Resolution
- Authors: Zhongwei Qiu, Huan Yang, Jianlong Fu, Daochang Liu, Chang Xu, Dongmei
Fu
- Abstract summary: Video Super-Resolution (VSR) aims to restore high-resolution (HR) videos from low-resolution (LR) videos.
Existing VSR techniques usually recover HR frames by extracting textures from nearby frames with known degradation processes.
We propose a novel Frequency-Transformer (FTVSR) for handling low-quality videos that carry out self-attention in a combined space-time-frequency domain.
- Score: 47.5883522564362
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video Super-Resolution (VSR) aims to restore high-resolution (HR) videos from
low-resolution (LR) videos. Existing VSR techniques usually recover HR frames
by extracting pertinent textures from nearby frames with known degradation
processes. Despite significant progress, grand challenges are remained to
effectively extract and transmit high-quality textures from high-degraded
low-quality sequences, such as blur, additive noises, and compression
artifacts. In this work, a novel Frequency-Transformer (FTVSR) is proposed for
handling low-quality videos that carry out self-attention in a combined
space-time-frequency domain. First, video frames are split into patches and
each patch is transformed into spectral maps in which each channel represents a
frequency band. It permits a fine-grained self-attention on each frequency
band, so that real visual texture can be distinguished from artifacts. Second,
a novel dual frequency attention (DFA) mechanism is proposed to capture the
global frequency relations and local frequency relations, which can handle
different complicated degradation processes in real-world scenarios. Third, we
explore different self-attention schemes for video processing in the frequency
domain and discover that a ``divided attention'' which conducts a joint
space-frequency attention before applying temporal-frequency attention, leads
to the best video enhancement quality. Extensive experiments on three
widely-used VSR datasets show that FTVSR outperforms state-of-the-art methods
on different low-quality videos with clear visual margins. Code and pre-trained
models are available at https://github.com/researchmm/FTVSR.
Related papers
- FCVSR: A Frequency-aware Method for Compressed Video Super-Resolution [26.35492218473007]
We propose a deep frequency-based compressed video SR model (FCVSR) consisting of a motion-guided adaptive alignment network and a multi-frequency feature refinement module.
The proposed model has been evaluated on three compressed video compressed super-resolution datasets.
arXiv Detail & Related papers (2025-02-10T13:08:57Z) - BF-STVSR: B-Splines and Fourier-Best Friends for High Fidelity Spatial-Temporal Video Super-Resolution [14.082598088990352]
We propose a C-STVSR framework with two key modules tailored to better represent spatial and temporal characteristics of video.
Our approach achieves state-of-the-art PSNR and SSIM performance, showing enhanced spatial details and natural temporal consistency.
arXiv Detail & Related papers (2025-01-19T13:29:41Z) - Sharpening Neural Implicit Functions with Frequency Consolidation Priors [53.6277160912059]
Signed Distance Functions (SDFs) are vital implicit representations to represent high fidelity 3D surfaces.
Current methods mainly leverage a neural network to learn an SDF from various supervisions including signed, 3D point clouds, or multi-view images.
We introduce a method to sharpen a low frequency SDF observation by recovering its high frequency components, pursuing a sharper and more complete surface.
arXiv Detail & Related papers (2024-12-27T16:18:46Z) - Delving into the Frequency: Temporally Consistent Human Motion Transfer
in the Fourier Space [34.353035276767336]
Human motion transfer refers to synthesizing photo-realistic and temporally coherent videos.
Current synthetic videos suffer from the temporal inconsistency in sequential frames that significantly degrades the video quality.
We propose a novel Frequency-based human MOtion TRansfer framework, named FreMOTR, which can effectively mitigate the spatial artifacts and the temporal inconsistency of the synthesized videos.
arXiv Detail & Related papers (2022-09-01T05:30:23Z) - Learning Spatiotemporal Frequency-Transformer for Compressed Video
Super-Resolution [38.00182505384986]
We propose a novel Frequency-Transformer for compressed video super-resolution (FTVSR)
First, we divide a video frame into patches, and transform each patch into DCT spectral maps in which each channel represents a frequency band.
Second, we study different self-attention schemes, and discover that a divided attention which conducts a joint space-frequency attention before applying temporal attention on each frequency band, leads to the best video enhancement quality.
arXiv Detail & Related papers (2022-08-05T07:02:30Z) - Towards Interpretable Video Super-Resolution via Alternating
Optimization [115.85296325037565]
We study a practical space-time video super-resolution (STVSR) problem which aims at generating a high-framerate high-resolution sharp video from a low-framerate blurry video.
We propose an interpretable STVSR framework by leveraging both model-based and learning-based methods.
arXiv Detail & Related papers (2022-07-21T21:34:05Z) - Spatial-Temporal Frequency Forgery Clue for Video Forgery Detection in
VIS and NIR Scenario [87.72258480670627]
Existing face forgery detection methods based on frequency domain find that the GAN forged images have obvious grid-like visual artifacts in the frequency spectrum compared to the real images.
This paper proposes a Cosine Transform-based Forgery Clue Augmentation Network (FCAN-DCT) to achieve a more comprehensive spatial-temporal feature representation.
arXiv Detail & Related papers (2022-07-05T09:27:53Z) - VRT: A Video Restoration Transformer [126.79589717404863]
Video restoration (e.g., video super-resolution) aims to restore high-quality frames from low-quality frames.
We propose a Video Restoration Transformer (VRT) with parallel frame prediction and long-range temporal dependency modelling abilities.
arXiv Detail & Related papers (2022-01-28T17:54:43Z) - Investigating Tradeoffs in Real-World Video Super-Resolution [90.81396836308085]
Real-world video super-resolution (VSR) models are often trained with diverse degradations to improve generalizability.
To alleviate the first tradeoff, we propose a degradation scheme that reduces up to 40% of training time without sacrificing performance.
To facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences.
arXiv Detail & Related papers (2021-11-24T18:58:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.