Rethinking Diffusion Model-Based Video Super-Resolution: Leveraging Dense Guidance from Aligned Features
- URL: http://arxiv.org/abs/2511.16928v1
- Date: Fri, 21 Nov 2025 03:40:45 GMT
- Title: Rethinking Diffusion Model-Based Video Super-Resolution: Leveraging Dense Guidance from Aligned Features
- Authors: Jingyi Xu, Meisong Zheng, Ying Chen, Minglang Qiao, Xin Deng, Mai Xu,
- Abstract summary: Video Super-Resolution approaches suffer from error accumulation, spatial artifacts, and a trade-off between perceptual quality and fidelity.<n>We propose a novelly Guided diffusion model with Aligned Features for Video Super-Resolution (DGAF-VSR)<n>Experiments on synthetic and real-world datasets demonstrate that DGAF-VSR surpasses state-of-the-art methods in key aspects of VSR.
- Score: 51.5076190312734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion model (DM) based Video Super-Resolution (VSR) approaches achieve impressive perceptual quality. However, they suffer from error accumulation, spatial artifacts, and a trade-off between perceptual quality and fidelity, primarily caused by inaccurate alignment and insufficient compensation between video frames. In this paper, within the DM-based VSR pipeline, we revisit the role of alignment and compensation between adjacent video frames and reveal two crucial observations: (a) the feature domain is better suited than the pixel domain for information compensation due to its stronger spatial and temporal correlations, and (b) warping at an upscaled resolution better preserves high-frequency information, but this benefit is not necessarily monotonic. Therefore, we propose a novel Densely Guided diffusion model with Aligned Features for Video Super-Resolution (DGAF-VSR), with an Optical Guided Warping Module (OGWM) to maintain high-frequency details in the aligned features and a Feature-wise Temporal Condition Module (FTCM) to deliver dense guidance in the feature domain. Extensive experiments on synthetic and real-world datasets demonstrate that DGAF-VSR surpasses state-of-the-art methods in key aspects of VSR, including perceptual quality (35.82\% DISTS reduction), fidelity (0.20 dB PSNR gain), and temporal consistency (30.37\% tLPIPS reduction).
Related papers
- UltraVSR: Achieving Ultra-Realistic Video Super-Resolution with Efficient One-Step Diffusion Space [46.43409853027655]
Diffusion models have shown great potential in generating realistic image detail.<n>Adapting these models to video super-resolution (VSR) remains challenging due to their inherentity and lack of temporal modeling.<n>We propose UltraVSR, a novel framework that enables ultra-realistic and temporally-coherent VSR through an efficient one-step diffusion space.
arXiv Detail & Related papers (2025-05-26T13:19:27Z) - Temporal-Consistent Video Restoration with Pre-trained Diffusion Models [51.47188802535954]
Video restoration (VR) aims to recover high-quality videos from degraded ones.<n>Recent zero-shot VR methods using pre-trained diffusion models (DMs) suffer from approximation errors during reverse diffusion and insufficient temporal consistency.<n>We present a novel a Posterior Maximum (MAP) framework that directly parameterizes video frames in the seed space of DMs, eliminating approximation errors.
arXiv Detail & Related papers (2025-03-19T03:41:56Z) - Spatio-Temporal Distortion Aware Omnidirectional Video Super-Resolution [25.615935776826596]
Omnidirectional (ODVs) provide an immersive visual experience by capturing the 360deg scene.<n>ODVs often suffer from low resolution due to their wide field of view and limitations in capturing devices and transmission bandwidth.<n>We propose a Spatio-Temporal Aware Distortion Network (STDAN) to alleviate spatial projection distortions and temporal flickering of ODVs.
arXiv Detail & Related papers (2024-10-15T11:17:19Z) - Collaborative Feedback Discriminative Propagation for Video Super-Resolution [66.61201445650323]
Key success of video super-resolution (VSR) methods stems mainly from exploring spatial and temporal information.
Inaccurate alignment usually leads to aligned features with significant artifacts.
propagation modules only propagate the same timestep features forward or backward.
arXiv Detail & Related papers (2024-04-06T22:08:20Z) - Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution [151.1255837803585]
We propose a novel approach, pursuing Spatial Adaptation and Temporal Coherence (SATeCo) for video super-resolution.
SATeCo pivots on learning spatial-temporal guidance from low-resolution videos to calibrate both latent-space high-resolution video denoising and pixel-space video reconstruction.
Experiments conducted on the REDS4 and Vid4 datasets demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-03-25T17:59:26Z) - Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution [15.197746480157651]
We propose an effective real-world VSR algorithm by leveraging the strength of pre-trained latent diffusion models.
We exploit the temporal dynamics in LR videos to guide the diffusion process by optimizing the latent sampling path with a motion-guided loss.
The proposed motion-guided latent diffusion based VSR algorithm achieves significantly better perceptual quality than state-of-the-arts on real-world VSR benchmark datasets.
arXiv Detail & Related papers (2023-12-01T14:40:07Z) - Enhancing Perceptual Quality in Video Super-Resolution through Temporally-Consistent Detail Synthesis using Diffusion Models [17.570136632211693]
We present StableVSR, a VSR method based on DMs that can enhance the perceptual quality of upscaled videos by synthesizing realistic and temporally-consistent details.
We demonstrate the effectiveness of StableVSR in enhancing the perceptual quality of upscaled videos while achieving better temporal consistency compared to existing state-of-the-art methods for VSR.
arXiv Detail & Related papers (2023-11-27T15:14:38Z) - Local-Global Temporal Difference Learning for Satellite Video Super-Resolution [53.03380679343968]
We propose to exploit the well-defined temporal difference for efficient and effective temporal compensation.<n>To fully utilize the local and global temporal information within frames, we systematically modeled the short-term and long-term temporal discrepancies.<n> Rigorous objective and subjective evaluations conducted across five mainstream video satellites demonstrate that our method performs favorably against state-of-the-art approaches.
arXiv Detail & Related papers (2023-04-10T07:04:40Z) - Look Back and Forth: Video Super-Resolution with Explicit Temporal
Difference Modeling [105.69197687940505]
We propose to explore the role of explicit temporal difference modeling in both LR and HR space.
To further enhance the super-resolution result, not only spatial residual features are extracted, but the difference between consecutive frames in high-frequency domain is also computed.
arXiv Detail & Related papers (2022-04-14T17:07:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.