Related papers: Learning Spatiotemporal Frequency-Transformer for Low-Quality Video Super-Resolution

Learning Spatiotemporal Frequency-Transformer for Low-Quality Video Super-Resolution

URL: http://arxiv.org/abs/2212.14046v1
Date: Tue, 27 Dec 2022 16:26:15 GMT
Title: Learning Spatiotemporal Frequency-Transformer for Low-Quality Video Super-Resolution
Authors: Zhongwei Qiu, Huan Yang, Jianlong Fu, Daochang Liu, Chang Xu, Dongmei Fu
Abstract summary: Video Super-Resolution (VSR) aims to restore high-resolution (HR) videos from low-resolution (LR) videos. Existing VSR techniques usually recover HR frames by extracting textures from nearby frames with known degradation processes. We propose a novel Frequency-Transformer (FTVSR) for handling low-quality videos that carry out self-attention in a combined space-time-frequency domain.
Score: 47.5883522564362
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video Super-Resolution (VSR) aims to restore high-resolution (HR) videos from low-resolution (LR) videos. Existing VSR techniques usually recover HR frames by extracting pertinent textures from nearby frames with known degradation processes. Despite significant progress, grand challenges are remained to effectively extract and transmit high-quality textures from high-degraded low-quality sequences, such as blur, additive noises, and compression artifacts. In this work, a novel Frequency-Transformer (FTVSR) is proposed for handling low-quality videos that carry out self-attention in a combined space-time-frequency domain. First, video frames are split into patches and each patch is transformed into spectral maps in which each channel represents a frequency band. It permits a fine-grained self-attention on each frequency band, so that real visual texture can be distinguished from artifacts. Second, a novel dual frequency attention (DFA) mechanism is proposed to capture the global frequency relations and local frequency relations, which can handle different complicated degradation processes in real-world scenarios. Third, we explore different self-attention schemes for video processing in the frequency domain and discover that a ``divided attention'' which conducts a joint space-frequency attention before applying temporal-frequency attention, leads to the best video enhancement quality. Extensive experiments on three widely-used VSR datasets show that FTVSR outperforms state-of-the-art methods on different low-quality videos with clear visual margins. Code and pre-trained models are available at https://github.com/researchmm/FTVSR.

Related papers

Frequency Enhancement for Image Demosaicking [40.76899837631637]
We propose Dual-path Frequency Enhancement Network (DFENet), which reconstructs RGB images in a divide-and-conquer manner. One path focuses on generating missing information through detail refinement in spatial domain, while the other aims at suppressing undesirable frequencies. With these designs, the proposed DFENet outperforms other state-of-the-art algorithms on different datasets.
arXiv Detail & Related papers (2025-03-20T02:37:10Z)
FCVSR: A Frequency-aware Method for Compressed Video Super-Resolution [26.35492218473007]
We propose a deep frequency-based compressed video SR model (FCVSR) consisting of a motion-guided adaptive alignment network and a multi-frequency feature refinement module. The proposed model has been evaluated on three compressed video compressed super-resolution datasets.
arXiv Detail & Related papers (2025-02-10T13:08:57Z)
Sharpening Neural Implicit Functions with Frequency Consolidation Priors [53.6277160912059]
Signed Distance Functions (SDFs) are vital implicit representations to represent high fidelity 3D surfaces. Current methods mainly leverage a neural network to learn an SDF from various supervisions including signed, 3D point clouds, or multi-view images. We introduce a method to sharpen a low frequency SDF observation by recovering its high frequency components, pursuing a sharper and more complete surface.
arXiv Detail & Related papers (2024-12-27T16:18:46Z)
Neural Video Representation for Redundancy Reduction and Consistency Preservation [0.0]
Implicit neural representation (INR) embed various signals into neural networks. We propose a video representation method that generates both the high-frequency and low-frequency components of the frame. Experimental results demonstrate that our method outperforms the existing HNeRV method, achieving superior results in 96 percent of the videos.
arXiv Detail & Related papers (2024-09-27T07:30:12Z)
Delving into the Frequency: Temporally Consistent Human Motion Transfer in the Fourier Space [34.353035276767336]
Human motion transfer refers to synthesizing photo-realistic and temporally coherent videos. Current synthetic videos suffer from the temporal inconsistency in sequential frames that significantly degrades the video quality. We propose a novel Frequency-based human MOtion TRansfer framework, named FreMOTR, which can effectively mitigate the spatial artifacts and the temporal inconsistency of the synthesized videos.
arXiv Detail & Related papers (2022-09-01T05:30:23Z)
Learning Spatiotemporal Frequency-Transformer for Compressed Video Super-Resolution [38.00182505384986]
We propose a novel Frequency-Transformer for compressed video super-resolution (FTVSR) First, we divide a video frame into patches, and transform each patch into DCT spectral maps in which each channel represents a frequency band. Second, we study different self-attention schemes, and discover that a divided attention which conducts a joint space-frequency attention before applying temporal attention on each frequency band, leads to the best video enhancement quality.
arXiv Detail & Related papers (2022-08-05T07:02:30Z)
Towards Interpretable Video Super-Resolution via Alternating Optimization [115.85296325037565]
We study a practical space-time video super-resolution (STVSR) problem which aims at generating a high-framerate high-resolution sharp video from a low-framerate blurry video. We propose an interpretable STVSR framework by leveraging both model-based and learning-based methods.
arXiv Detail & Related papers (2022-07-21T21:34:05Z)
Spatial-Temporal Frequency Forgery Clue for Video Forgery Detection in VIS and NIR Scenario [87.72258480670627]
Existing face forgery detection methods based on frequency domain find that the GAN forged images have obvious grid-like visual artifacts in the frequency spectrum compared to the real images. This paper proposes a Cosine Transform-based Forgery Clue Augmentation Network (FCAN-DCT) to achieve a more comprehensive spatial-temporal feature representation.
arXiv Detail & Related papers (2022-07-05T09:27:53Z)
VideoINR: Learning Video Implicit Neural Representation for Continuous Space-Time Super-Resolution [75.79379734567604]
We show that Video Implicit Neural Representation (VideoINR) can be decoded to videos of arbitrary spatial resolution and frame rate. We show that VideoINR achieves competitive performances with state-of-the-art STVSR methods on common up-sampling scales.
arXiv Detail & Related papers (2022-06-09T17:45:49Z)
VRT: A Video Restoration Transformer [126.79589717404863]
Video restoration (e.g., video super-resolution) aims to restore high-quality frames from low-quality frames. We propose a Video Restoration Transformer (VRT) with parallel frame prediction and long-range temporal dependency modelling abilities.
arXiv Detail & Related papers (2022-01-28T17:54:43Z)
Investigating Tradeoffs in Real-World Video Super-Resolution [90.81396836308085]
Real-world video super-resolution (VSR) models are often trained with diverse degradations to improve generalizability. To alleviate the first tradeoff, we propose a degradation scheme that reduces up to 40% of training time without sacrificing performance. To facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences.
arXiv Detail & Related papers (2021-11-24T18:58:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.