Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks
- URL: http://arxiv.org/abs/2503.16930v1
- Date: Fri, 21 Mar 2025 08:02:48 GMT
- Title: Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks
- Authors: Haijin Zeng, Xiangming Wang, Yongyong Chen, Jingyong Su, Jie Liu,
- Abstract summary: Vision-Language-guided Unfolding Network (VLU-Net) is a unified DUN framework for handling multiple degradation types simultaneously.<n>VLU-Net is the first all-in-one DUN framework and outperforms current leading one-by-one and all-in-one end-to-end methods by 3.74 dB on the SOTS dehazing dataset and 1.70 dB on the Rain100L deraining dataset.
- Score: 14.180694577459425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dynamic image degradations, including noise, blur and lighting inconsistencies, pose significant challenges in image restoration, often due to sensor limitations or adverse environmental conditions. Existing Deep Unfolding Networks (DUNs) offer stable restoration performance but require manual selection of degradation matrices for each degradation type, limiting their adaptability across diverse scenarios. To address this issue, we propose the Vision-Language-guided Unfolding Network (VLU-Net), a unified DUN framework for handling multiple degradation types simultaneously. VLU-Net leverages a Vision-Language Model (VLM) refined on degraded image-text pairs to align image features with degradation descriptions, selecting the appropriate transform for target degradation. By integrating an automatic VLM-based gradient estimation strategy into the Proximal Gradient Descent (PGD) algorithm, VLU-Net effectively tackles complex multi-degradation restoration tasks while maintaining interpretability. Furthermore, we design a hierarchical feature unfolding structure to enhance VLU-Net framework, efficiently synthesizing degradation patterns across various levels. VLU-Net is the first all-in-one DUN framework and outperforms current leading one-by-one and all-in-one end-to-end methods by 3.74 dB on the SOTS dehazing dataset and 1.70 dB on the Rain100L deraining dataset.
Related papers
- Beyond Degradation Redundancy: Contrastive Prompt Learning for All-in-One Image Restoration [109.38288333994407]
Contrastive Prompt Learning (CPL) is a novel framework that fundamentally enhances prompt-task alignment.
Our framework establishes new state-of-the-art performance while maintaining parameter efficiency, offering a principled solution for unified image restoration.
arXiv Detail & Related papers (2025-04-14T08:24:57Z) - VL-UR: Vision-Language-guided Universal Restoration of Images Degraded by Adverse Weather Conditions [3.0133850348026936]
We propose a vision-language-guided universal restoration framework (VL-UR)
VL-UR integrates visual and semantic information to enhance image restoration by integrating visual and semantic information.
experiments across eleven diverse degradation settings demonstrate VL-UR's state-of-the-art performance, robustness, and adaptability.
arXiv Detail & Related papers (2025-04-11T02:59:06Z) - Dynamic Degradation Decomposition Network for All-in-One Image Restoration [3.856518745550605]
We introduce a dynamic degradation decomposition network for all-in-one image restoration, named D$3$Net.<n>D$3$Net achieves degradation-adaptive image restoration with guided prompt through cross-domain interaction and dynamic degradation decomposition.<n>Experiments on multiple image restoration tasks demonstrate that D$3$Net significantly outperforms the state-of-the-art approaches.
arXiv Detail & Related papers (2025-02-26T11:49:58Z) - VRVVC: Variable-Rate NeRF-Based Volumetric Video Compression [59.14355576912495]
NeRF-based video has revolutionized visual media by delivering photorealistic Free-Viewpoint Video (FVV) experiences.
The substantial data volumes pose significant challenges for storage and transmission.
We propose VRVVC, a novel end-to-end joint variable-rate framework for video compression.
arXiv Detail & Related papers (2024-12-16T01:28:04Z) - Mixed Degradation Image Restoration via Local Dynamic Optimization and Conditional Embedding [67.57487747508179]
Multiple-in-one image restoration (IR) has made significant progress, aiming to handle all types of single degraded image restoration with a single model.
In this paper, we propose a novel multiple-in-one IR model that can effectively restore images with both single and mixed degradations.
arXiv Detail & Related papers (2024-11-25T09:26:34Z) - Unified-Width Adaptive Dynamic Network for All-In-One Image Restoration [50.81374327480445]
We introduce a novel concept positing that intricate image degradation can be represented in terms of elementary degradation.
We propose the Unified-Width Adaptive Dynamic Network (U-WADN), consisting of two pivotal components: a Width Adaptive Backbone (WAB) and a Width Selector (WS)
The proposed U-WADN achieves better performance while simultaneously reducing up to 32.3% of FLOPs and providing approximately 15.7% real-time acceleration.
arXiv Detail & Related papers (2024-01-24T04:25:12Z) - DGNet: Dynamic Gradient-Guided Network for Water-Related Optics Image
Enhancement [77.0360085530701]
Underwater image enhancement (UIE) is a challenging task due to the complex degradation caused by underwater environments.
Previous methods often idealize the degradation process, and neglect the impact of medium noise and object motion on the distribution of image features.
Our approach utilizes predicted images to dynamically update pseudo-labels, adding a dynamic gradient to optimize the network's gradient space.
arXiv Detail & Related papers (2023-12-12T06:07:21Z) - Latent Degradation Representation Constraint for Single Image Deraining [13.414207526373959]
We propose a novel Latent Degradation Representation Constraint Network (LDRCNet) that consists of Direction-Aware (DAEncoder), UNet Deraining Network, and MSIBlock.
Experimental results on synthetic and real datasets demonstrate that our method achieves new state-of-the-art performance.
arXiv Detail & Related papers (2023-09-09T12:50:06Z) - Cross-Consistent Deep Unfolding Network for Adaptive All-In-One Video
Restoration [78.14941737723501]
We propose a Cross-consistent Deep Unfolding Network (CDUN) for All-In-One VR.
By orchestrating two cascading procedures, CDUN achieves adaptive processing for diverse degradations.
In addition, we introduce a window-based inter-frame fusion strategy to utilize information from more adjacent frames.
arXiv Detail & Related papers (2023-09-04T14:18:00Z) - Local-Global Transformer Enhanced Unfolding Network for Pan-sharpening [13.593522290577512]
Pan-sharpening aims to increase the spatial resolution of the low-resolution multispectral (LrMS) image with the guidance of the corresponding panchromatic (PAN) image.
Although deep learning (DL)-based pan-sharpening methods have achieved promising performance, most of them have a two-fold deficiency.
arXiv Detail & Related papers (2023-04-28T03:34:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.