Investigating Tradeoffs in Real-World Video Super-Resolution
- URL: http://arxiv.org/abs/2111.12704v1
- Date: Wed, 24 Nov 2021 18:58:21 GMT
- Title: Investigating Tradeoffs in Real-World Video Super-Resolution
- Authors: Kelvin C.K. Chan, Shangchen Zhou, Xiangyu Xu, Chen Change Loy
- Abstract summary: Real-world video super-resolution (VSR) models are often trained with diverse degradations to improve generalizability.
To alleviate the first tradeoff, we propose a degradation scheme that reduces up to 40% of training time without sacrificing performance.
To facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences.
- Score: 90.81396836308085
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The diversity and complexity of degradations in real-world video
super-resolution (VSR) pose non-trivial challenges in inference and training.
First, while long-term propagation leads to improved performance in cases of
mild degradations, severe in-the-wild degradations could be exaggerated through
propagation, impairing output quality. To balance the tradeoff between detail
synthesis and artifact suppression, we found an image pre-cleaning stage
indispensable to reduce noises and artifacts prior to propagation. Equipped
with a carefully designed cleaning module, our RealBasicVSR outperforms
existing methods in both quality and efficiency. Second, real-world VSR models
are often trained with diverse degradations to improve generalizability,
requiring increased batch size to produce a stable gradient. Inevitably, the
increased computational burden results in various problems, including 1)
speed-performance tradeoff and 2) batch-length tradeoff. To alleviate the first
tradeoff, we propose a stochastic degradation scheme that reduces up to 40\% of
training time without sacrificing performance. We then analyze different
training settings and suggest that employing longer sequences rather than
larger batches during training allows more effective uses of temporal
information, leading to more stable performance during inference. To facilitate
fair comparisons, we propose the new VideoLQ dataset, which contains a large
variety of real-world low-quality video sequences containing rich textures and
patterns. Our dataset can serve as a common ground for benchmarking. Code,
models, and the dataset will be made publicly available.
Related papers
- DGTR: Distributed Gaussian Turbo-Reconstruction for Sparse-View Vast Scenes [81.56206845824572]
Novel-view synthesis (NVS) approaches play a critical role in vast scene reconstruction.
Few-shot methods often struggle with poor reconstruction quality in vast environments.
This paper presents DGTR, a novel distributed framework for efficient Gaussian reconstruction for sparse-view vast scenes.
arXiv Detail & Related papers (2024-11-19T07:51:44Z) - Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think [72.48325960659822]
One main bottleneck in training large-scale diffusion models for generation lies in effectively learning these representations.
We study this by introducing a straightforward regularization called REPresentation Alignment (REPA), which aligns the projections of noisy input hidden states in denoising networks with clean image representations obtained from external, pretrained visual encoders.
The results are striking: our simple strategy yields significant improvements in both training efficiency and generation quality when applied to popular diffusion and flow-based transformers, such as DiTs and SiTs.
arXiv Detail & Related papers (2024-10-09T14:34:53Z) - Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding [61.89781979702939]
This study quantitatively reveals an "impossible trinity" among data quantity, diversity, and quality in pre-training datasets.
Recent efforts seek to refine large-scale, diverse ASR datasets compromised by low quality through synthetic annotations.
We introduce the Video DataFlywheel framework, which iteratively refines video annotations with improved noise control methods.
arXiv Detail & Related papers (2024-09-29T03:33:35Z) - Pairwise Distance Distillation for Unsupervised Real-World Image Super-Resolution [38.79439380482431]
Real-world super-resolution (RWSR) faces unknown degradations in the low-resolution inputs, all the while lacking paired training data.
Existing methods approach this problem by learning blind general models through complex synthetic augmentations on training inputs.
We introduce a novel pairwise distance distillation framework to address the unsupervised RWSR for a targeted real-world degradation.
arXiv Detail & Related papers (2024-07-10T01:46:40Z) - Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution [15.197746480157651]
We propose an effective real-world VSR algorithm by leveraging the strength of pre-trained latent diffusion models.
We exploit the temporal dynamics in LR videos to guide the diffusion process by optimizing the latent sampling path with a motion-guided loss.
The proposed motion-guided latent diffusion based VSR algorithm achieves significantly better perceptual quality than state-of-the-arts on real-world VSR benchmark datasets.
arXiv Detail & Related papers (2023-12-01T14:40:07Z) - Efficient Test-Time Adaptation for Super-Resolution with Second-Order
Degradation and Reconstruction [62.955327005837475]
Image super-resolution (SR) aims to learn a mapping from low-resolution (LR) to high-resolution (HR) using paired HR-LR training images.
We present an efficient test-time adaptation framework for SR, named SRTTA, which is able to quickly adapt SR models to test domains with different/unknown degradation types.
arXiv Detail & Related papers (2023-10-29T13:58:57Z) - Expanding Synthetic Real-World Degradations for Blind Video Super
Resolution [3.474523163017713]
Video super-resolution (VSR) techniques have drastically improved over the last few years and shown impressive performance on synthetic data.
However, their performance on real-world video data suffers because of the complexity of real-world degradations and misaligned video frames.
In this paper, we propose real-world degradations on synthetic training datasets.
arXiv Detail & Related papers (2023-05-04T08:58:31Z) - DR2: Diffusion-based Robust Degradation Remover for Blind Face
Restoration [66.01846902242355]
Blind face restoration usually synthesizes degraded low-quality data with a pre-defined degradation model for training.
It is expensive and infeasible to include every type of degradation to cover real-world cases in the training data.
We propose Robust Degradation Remover (DR2) to first transform the degraded image to a coarse but degradation-invariant prediction, then employ an enhancement module to restore the coarse prediction to a high-quality image.
arXiv Detail & Related papers (2023-03-13T06:05:18Z) - Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure
Synthetic Data [17.529045507657944]
We extend the powerful ESRGAN to a practical restoration application (namely, Real-ESRGAN)
A high-order degradation modeling process is introduced to better simulate complex real-world degradations.
We also consider the common ringing and overshoot artifacts in the synthesis process.
arXiv Detail & Related papers (2021-07-22T17:43:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.