Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution
- URL: http://arxiv.org/abs/2312.00853v2
- Date: Fri, 12 Jul 2024 13:55:40 GMT
- Title: Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution
- Authors: Xi Yang, Chenhang He, Jianqi Ma, Lei Zhang,
- Abstract summary: We propose an effective real-world VSR algorithm by leveraging the strength of pre-trained latent diffusion models.
We exploit the temporal dynamics in LR videos to guide the diffusion process by optimizing the latent sampling path with a motion-guided loss.
The proposed motion-guided latent diffusion based VSR algorithm achieves significantly better perceptual quality than state-of-the-arts on real-world VSR benchmark datasets.
- Score: 15.197746480157651
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Real-world low-resolution (LR) videos have diverse and complex degradations, imposing great challenges on video super-resolution (VSR) algorithms to reproduce their high-resolution (HR) counterparts with high quality. Recently, the diffusion models have shown compelling performance in generating realistic details for image restoration tasks. However, the diffusion process has randomness, making it hard to control the contents of restored images. This issue becomes more serious when applying diffusion models to VSR tasks because temporal consistency is crucial to the perceptual quality of videos. In this paper, we propose an effective real-world VSR algorithm by leveraging the strength of pre-trained latent diffusion models. To ensure the content consistency among adjacent frames, we exploit the temporal dynamics in LR videos to guide the diffusion process by optimizing the latent sampling path with a motion-guided loss, ensuring that the generated HR video maintains a coherent and continuous visual flow. To further mitigate the discontinuity of generated details, we insert temporal module to the decoder and fine-tune it with an innovative sequence-oriented loss. The proposed motion-guided latent diffusion (MGLD) based VSR algorithm achieves significantly better perceptual quality than state-of-the-arts on real-world VSR benchmark datasets, validating the effectiveness of the proposed model design and training strategies.
Related papers
- ConsisSR: Delving Deep into Consistency in Diffusion-based Image Super-Resolution [28.945663118445037]
Real-world image super-resolution (Real-ISR) aims at restoring high-quality (HQ) images from low-quality (LQ) inputs corrupted by unknown and complex degradations.
We introduce ConsisSR to handle both semantic and pixel-level consistency.
arXiv Detail & Related papers (2024-10-17T17:41:52Z) - Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World
Video Super-Resolution [65.91317390645163]
Upscale-A-Video is a text-guided latent diffusion framework for video upscaling.
It ensures temporal coherence through two key mechanisms: locally, it integrates temporal layers into U-Net and VAE-Decoder, maintaining consistency within short sequences.
It also offers greater flexibility by allowing text prompts to guide texture creation and adjustable noise levels to balance restoration and generation.
arXiv Detail & Related papers (2023-12-11T18:54:52Z) - Enhancing Perceptual Quality in Video Super-Resolution through Temporally-Consistent Detail Synthesis using Diffusion Models [17.570136632211693]
We present StableVSR, a VSR method based on DMs that can enhance the perceptual quality of upscaled videos by synthesizing realistic and temporally-consistent details.
We demonstrate the effectiveness of StableVSR in enhancing the perceptual quality of upscaled videos while achieving better temporal consistency compared to existing state-of-the-art methods for VSR.
arXiv Detail & Related papers (2023-11-27T15:14:38Z) - Look Back and Forth: Video Super-Resolution with Explicit Temporal
Difference Modeling [105.69197687940505]
We propose to explore the role of explicit temporal difference modeling in both LR and HR space.
To further enhance the super-resolution result, not only spatial residual features are extracted, but the difference between consecutive frames in high-frequency domain is also computed.
arXiv Detail & Related papers (2022-04-14T17:07:33Z) - Investigating Tradeoffs in Real-World Video Super-Resolution [90.81396836308085]
Real-world video super-resolution (VSR) models are often trained with diverse degradations to improve generalizability.
To alleviate the first tradeoff, we propose a degradation scheme that reduces up to 40% of training time without sacrificing performance.
To facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences.
arXiv Detail & Related papers (2021-11-24T18:58:21Z) - DynaVSR: Dynamic Adaptive Blind Video Super-Resolution [60.154204107453914]
DynaVSR is a novel meta-learning-based framework for real-world video SR.
We train a multi-frame downscaling module with various types of synthetic blur kernels, which is seamlessly combined with a video SR network for input-aware adaptation.
Experimental results show that DynaVSR consistently improves the performance of the state-of-the-art video SR models by a large margin.
arXiv Detail & Related papers (2020-11-09T15:07:32Z) - Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video
Super-Resolution [95.26202278535543]
A simple solution is to split it into two sub-tasks: video frame (VFI) and video super-resolution (VSR)
temporalsynthesis and spatial super-resolution are intra-related in this task.
We propose a one-stage space-time video super-resolution framework, which directly synthesizes an HR slow-motion video from an LFR, LR video.
arXiv Detail & Related papers (2020-02-26T16:59:48Z) - Video Face Super-Resolution with Motion-Adaptive Feedback Cell [90.73821618795512]
Video super-resolution (VSR) methods have recently achieved a remarkable success due to the development of deep convolutional neural networks (CNN)
In this paper, we propose a Motion-Adaptive Feedback Cell (MAFC), a simple but effective block, which can efficiently capture the motion compensation and feed it back to the network in an adaptive way.
arXiv Detail & Related papers (2020-02-15T13:14:10Z) - Deep Video Super-Resolution using HR Optical Flow Estimation [42.86066957681113]
Video super-resolution (SR) aims at generating a sequence of high-resolution (HR) frames with plausible and temporally consistent details from their low-resolution (LR) counterparts.
Existing deep learning based methods commonly estimate optical flows between LR frames to provide temporal dependency.
We propose an end-to-end video SR network to super-resolve both optical flows and images.
arXiv Detail & Related papers (2020-01-06T07:25:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.