A Modular Conditional Diffusion Framework for Image Reconstruction
- URL: http://arxiv.org/abs/2411.05993v1
- Date: Fri, 08 Nov 2024 22:11:29 GMT
- Title: A Modular Conditional Diffusion Framework for Image Reconstruction
- Authors: Magauiya Zhussip, Iaroslav Koshelev, Stamatis Lefkimmiatis,
- Abstract summary: Diffusion Probabilistic Models (DPMs) have been recently utilized to deal with various blind image restoration (IR) tasks.
We propose a modular diffusion probabilistic IR framework (DP-IR), which allows us to combine the performance benefits of existing pre-trained state-of-the-art IR networks and generative DPMs.
We evaluate our model on four benchmarks for the tasks of burst JDD-SR, dynamic scene deblurring, and super-resolution.
- Score: 3.451075831610783
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion Probabilistic Models (DPMs) have been recently utilized to deal with various blind image restoration (IR) tasks, where they have demonstrated outstanding performance in terms of perceptual quality. However, the task-specific nature of existing solutions and the excessive computational costs related to their training, make such models impractical and challenging to use for different IR tasks than those that were initially trained for. This hinders their wider adoption, especially by those who lack access to powerful computational resources and vast amount of training data. In this work we aim to address the above issues and enable the successful adoption of DPMs in practical IR-related applications. Towards this goal, we propose a modular diffusion probabilistic IR framework (DP-IR), which allows us to combine the performance benefits of existing pre-trained state-of-the-art IR networks and generative DPMs, while it requires only the additional training of a relatively small module (0.7M params) related to the particular IR task of interest. Moreover, the architecture of the proposed framework allows for a sampling strategy that leads to at least four times reduction of neural function evaluations without suffering any performance loss, while it can also be combined with existing acceleration techniques such as DDIM. We evaluate our model on four benchmarks for the tasks of burst JDD-SR, dynamic scene deblurring, and super-resolution. Our method outperforms existing approaches in terms of perceptual quality while it retains a competitive performance with respect to fidelity metrics.
Related papers
- Elucidating and Endowing the Diffusion Training Paradigm for General Image Restoration [73.4733153072447]
diffusion models demonstrate strong generative capabilities in image restoration (IR) tasks.<n>Their complex architectures and iterative processes limit their practical application compared to mainstream reconstruction-based general ordinary IR networks.<n>Existing approaches primarily focus on optimizing network architecture and diffusion paths but overlook the integration of the diffusion training paradigm within general ordinary IR frameworks.
arXiv Detail & Related papers (2025-06-26T19:14:27Z) - Interpretable Few-Shot Image Classification via Prototypical Concept-Guided Mixture of LoRA Experts [79.18608192761512]
Self-Explainable Models (SEMs) rely on Prototypical Concept Learning (PCL) to enable their visual recognition processes more interpretable.<n>We propose a Few-Shot Prototypical Concept Classification framework that mitigates two key challenges under low-data regimes: parametric imbalance and representation misalignment.<n>Our approach consistently outperforms existing SEMs by a notable margin, with 4.2%-8.7% relative gains in 5-way 5-shot classification.
arXiv Detail & Related papers (2025-06-05T06:39:43Z) - Feature Fusion and Knowledge-Distilled Multi-Modal Multi-Target Detection [2.295863158976069]
We propose a feature fusion and knowledge-distilled framework for multi-modal MTD.<n>We formulate the problem as a posterior probability optimization task, which is solved through a multi-stage training pipeline.<n> Experimental results demonstrate that our student model achieves approximately 95% of the teacher model's mean Average Precision.
arXiv Detail & Related papers (2025-05-31T03:11:44Z) - InvFussion: Bridging Supervised and Zero-shot Diffusion for Inverse Problems [76.39776789410088]
This work introduces a framework that combines the strong performance of supervised approaches and the flexibility of zero-shot methods.
A novel architectural design seamlessly integrates the degradation operator directly into the denoiser.
Experimental results on the FFHQ and ImageNet datasets demonstrate state-of-the-art posterior-sampling performance.
arXiv Detail & Related papers (2025-04-02T12:40:57Z) - Efficient Domain Adaptation of Multimodal Embeddings using Constrastive Learning [0.08192907805418582]
Current approaches either yield subpar results when using pretrained models without task-specific adaptation, or require substantial computational resources for fine-tuning.
We propose a novel method for adapting foundational, multimodal embeddings to downstream tasks, without the need of expensive fine-tuning processes.
arXiv Detail & Related papers (2025-02-04T06:30:12Z) - LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration [62.3751291442432]
We propose LoRA-IR, a flexible framework that dynamically leverages compact low-rank experts to facilitate efficient all-in-one image restoration.
LoRA-IR consists of two training stages: degradation-guided pre-training and parameter-efficient fine-tuning.
Experiments demonstrate that LoRA-IR achieves SOTA performance across 14 IR tasks and 29 benchmarks, while maintaining computational efficiency.
arXiv Detail & Related papers (2024-10-20T13:00:24Z) - Reprogramming Distillation for Medical Foundation Models [37.52464627899668]
We propose a novel framework called Reprogramming Distillation (RD)
RD reprograms the original feature space of the foundation model so that it is more relevant to downstream scenarios.
RD consistently achieve superior performance compared with previous PEFT and KD methods.
arXiv Detail & Related papers (2024-07-09T02:17:51Z) - Adv-KD: Adversarial Knowledge Distillation for Faster Diffusion Sampling [2.91204440475204]
Diffusion Probabilistic Models (DPMs) have emerged as a powerful class of deep generative models.
They rely on sequential denoising steps during sample generation.
We propose a novel method that integrates denoising phases directly into the model's architecture.
arXiv Detail & Related papers (2024-05-31T08:19:44Z) - AdaIR: Exploiting Underlying Similarities of Image Restoration Tasks with Adapters [57.62742271140852]
AdaIR is a novel framework that enables low storage cost and efficient training without sacrificing performance.
AdaIR requires solely the training of lightweight, task-specific modules, ensuring a more efficient storage and training regimen.
arXiv Detail & Related papers (2024-04-17T15:31:06Z) - Unsupervised Solution Operator Learning for Mean-Field Games via Sampling-Invariant Parametrizations [7.230928145936957]
We develop a novel framework to learn the MFG solution operator.
Our model takes a MFG instances as input and output their solutions with one forward pass.
It is discretization-free, making it suitable for learning operators of high-dimensional MFGs.
arXiv Detail & Related papers (2024-01-27T19:07:49Z) - When Parameter-efficient Tuning Meets General-purpose Vision-language
Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique.
Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z) - PREM: A Simple Yet Effective Approach for Node-Level Graph Anomaly
Detection [65.24854366973794]
Node-level graph anomaly detection (GAD) plays a critical role in identifying anomalous nodes from graph-structured data in domains such as medicine, social networks, and e-commerce.
We introduce a simple method termed PREprocessing and Matching (PREM for short) to improve the efficiency of GAD.
Our approach streamlines GAD, reducing time and memory consumption while maintaining powerful anomaly detection capabilities.
arXiv Detail & Related papers (2023-10-18T02:59:57Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - MMD-ReID: A Simple but Effective Solution for Visible-Thermal Person
ReID [20.08880264104061]
We propose a simple but effective framework, MMD-ReID, that reduces the modality gap by an explicit discrepancy reduction constraint.
We conduct extensive experiments to demonstrate both qualitatively and quantitatively the effectiveness of MMD-ReID.
The proposed framework significantly outperforms the state-of-the-art methods on SYSU-MM01 and RegDB datasets.
arXiv Detail & Related papers (2021-11-09T11:33:32Z) - Modality Compensation Network: Cross-Modal Adaptation for Action
Recognition [77.24983234113957]
We propose a Modality Compensation Network (MCN) to explore the relationships of different modalities.
Our model bridges data from source and auxiliary modalities by a modality adaptation block to achieve adaptive representation learning.
Experimental results reveal that MCN outperforms state-of-the-art approaches on four widely-used action recognition benchmarks.
arXiv Detail & Related papers (2020-01-31T04:51:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.