Coding-Prior Guided Diffusion Network for Video Deblurring
- URL: http://arxiv.org/abs/2504.12222v1
- Date: Wed, 16 Apr 2025 16:14:43 GMT
- Title: Coding-Prior Guided Diffusion Network for Video Deblurring
- Authors: Yike Liu, Jianhui Zhang, Haipeng Li, Shuaicheng Liu, Bing Zeng,
- Abstract summary: We present a novel framework that effectively leverages both coding priors and generative diffusion priors for high-quality deblurring.<n> Experiments demonstrate our method achieves state-of-the-art perceptual quality with up to 30% improvement in IQA metrics.
- Score: 47.77918791133459
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While recent video deblurring methods have advanced significantly, they often overlook two valuable prior information: (1) motion vectors (MVs) and coding residuals (CRs) from video codecs, which provide efficient inter-frame alignment cues, and (2) the rich real-world knowledge embedded in pre-trained diffusion generative models. We present CPGDNet, a novel two-stage framework that effectively leverages both coding priors and generative diffusion priors for high-quality deblurring. First, our coding-prior feature propagation (CPFP) module utilizes MVs for efficient frame alignment and CRs to generate attention masks, addressing motion inaccuracies and texture variations. Second, a coding-prior controlled generation (CPC) module network integrates coding priors into a pretrained diffusion model, guiding it to enhance critical regions and synthesize realistic details. Experiments demonstrate our method achieves state-of-the-art perceptual quality with up to 30% improvement in IQA metrics. Both the code and the codingprior-augmented dataset will be open-sourced.
Related papers
- When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding [118.72266141321647]
Cross-Modality Video Coding (CMVC) is a pioneering approach to explore multimodality representation and video generative models in video coding.
During decoding, previously encoded components and video generation models are leveraged to create multiple encoding-decoding modes.
Experiments indicate that TT2V achieves effective semantic reconstruction, while IT2V exhibits competitive perceptual consistency.
arXiv Detail & Related papers (2024-08-15T11:36:18Z) - Exploring Distortion Prior with Latent Diffusion Models for Remote Sensing Image Compression [9.742764207747697]
We propose a latent diffusion model-based remote sensing image compression (LDM-RSIC) method.
In the first stage, a self-encoder learns prior from the high-quality input image.
In the second stage, the prior is generated through an LDM conditioned on the decoded image of an existing learning-based image compression algorithm.
arXiv Detail & Related papers (2024-06-06T11:13:44Z) - Compression-Realized Deep Structural Network for Video Quality Enhancement [78.13020206633524]
This paper focuses on the task of quality enhancement for compressed videos.
Most of the existing methods lack a structured design to optimally leverage the priors within compression codecs.
A new paradigm is urgently needed for a more conscious'' process of quality enhancement.
arXiv Detail & Related papers (2024-05-10T09:18:17Z) - CPGA: Coding Priors-Guided Aggregation Network for Compressed Video Quality Enhancement [11.862146973848558]
Coding Priors-Guided Aggregation (CPGA) network is developed to utilize temporal and spatial information from coding priors.
To facilitate research in compressed video quality enhancement (VQE), we construct the Video Coding Priors dataset.
arXiv Detail & Related papers (2024-03-15T14:53:31Z) - Efficient Encoder-Decoder and Dual-Path Conformer for Comprehensive
Feature Learning in Speech Enhancement [0.2538209532048866]
This paper proposes a time-frequency (T-F) domain speech enhancement network (DPCFCS-Net)
It incorporates improved densely connected blocks, dual-path modules, convolution-augmented transformers (conformers), channel attention, and spatial attention.
Compared with previous models, our proposed model has a more efficient encoder-decoder and can learn comprehensive features.
arXiv Detail & Related papers (2023-06-09T12:52:01Z) - Denoising Diffusion Error Correction Codes [92.10654749898927]
Recently, neural decoders have demonstrated their advantage over classical decoding techniques.
Recent state-of-the-art neural decoders suffer from high complexity and lack the important iterative scheme characteristic of many legacy decoders.
We propose to employ denoising diffusion models for the soft decoding of linear codes at arbitrary block lengths.
arXiv Detail & Related papers (2022-09-16T11:00:50Z) - Neural Data-Dependent Transform for Learned Image Compression [72.86505042102155]
We build a neural data-dependent transform and introduce a continuous online mode decision mechanism to jointly optimize the coding efficiency for each individual image.
The experimental results show the effectiveness of the proposed neural-syntax design and the continuous online mode decision mechanism.
arXiv Detail & Related papers (2022-03-09T14:56:48Z) - A Coding Framework and Benchmark towards Low-Bitrate Video Understanding [63.05385140193666]
We propose a traditional-neural mixed coding framework that takes advantage of both traditional codecs and neural networks (NNs)
The framework is optimized by ensuring that a transportation-efficient semantic representation of the video is preserved.
We build a low-bitrate video understanding benchmark with three downstream tasks on eight datasets, demonstrating the notable superiority of our approach.
arXiv Detail & Related papers (2022-02-06T16:29:15Z) - End-to-end Neural Video Coding Using a Compound Spatiotemporal
Representation [33.54844063875569]
We propose a hybrid motion compensation (HMC) method that adaptively combines the predictions generated by two approaches.
Specifically, we generate a compoundtemporal representation (STR) through a recurrent information aggregation (RIA) module.
We further design a one-to-many decoder pipeline to generate multiple predictions from the CSTR, including vector-based resampling, adaptive kernel-based resampling, compensation mode selection maps and texture enhancements.
arXiv Detail & Related papers (2021-08-05T19:43:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.