Multi-task Image Restoration Guided By Robust DINO Features
- URL: http://arxiv.org/abs/2312.01677v3
- Date: Fri, 16 Aug 2024 14:53:09 GMT
- Title: Multi-task Image Restoration Guided By Robust DINO Features
- Authors: Xin Lin, Jingtong Yue, Kelvin C. K. Chan, Lu Qi, Chao Ren, Jinshan Pan, Ming-Hsuan Yang,
- Abstract summary: We propose mboxtextbfDINO-IR, a multi-task image restoration approach leveraging robust features extracted from DINOv2.
We first propose a pixel-semantic fusion (PSF) module to dynamically fuse DINOV2's shallow features.
By formulating these modules into a unified deep model, we propose a DINO perception contrastive loss to constrain the model training.
- Score: 88.74005987908443
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-task image restoration has gained significant interest due to its inherent versatility and efficiency compared to its single-task counterpart. However, performance decline is observed with an increase in the number of tasks, primarily attributed to the restoration model's challenge in handling different tasks with distinct natures at the same time. Thus, a perspective emerged aiming to explore the degradation-insensitive semantic commonalities among different degradation tasks. In this paper, we observe that the features of DINOv2 can effectively model semantic information and are independent of degradation factors. Motivated by this observation, we propose \mbox{\textbf{DINO-IR}}, a multi-task image restoration approach leveraging robust features extracted from DINOv2 to solve multi-task image restoration simultaneously. We first propose a pixel-semantic fusion (PSF) module to dynamically fuse DINOV2's shallow features containing pixel-level information and deep features containing degradation-independent semantic information. To guide the restoration model with the features of DINOv2, we develop a DINO-Restore adaption and fusion module to adjust the channel of fused features from PSF and then integrate them with the features from the restoration model. By formulating these modules into a unified deep model, we propose a DINO perception contrastive loss to constrain the model training. Extensive experimental results demonstrate that our DINO-IR performs favorably against existing multi-task image restoration approaches in various tasks by a large margin. The source codes and trained models will be made available.
Related papers
- RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models [45.88103575837924]
We introduce RestoreAgent, an intelligent image restoration system leveraging multimodal large language models.
RestoreAgent autonomously assesses the type and extent of degradation in input images and performs restoration through (1) determining the appropriate restoration tasks, (2) optimizing the task sequence, (3) selecting the most suitable models, and (4) executing the restoration.
Experimental results demonstrate the superior performance of RestoreAgent in handling complex degradation, surpassing human experts.
arXiv Detail & Related papers (2024-07-25T13:29:37Z) - VmambaIR: Visual State Space Model for Image Restoration [36.11385876754612]
We propose VmambaIR, which introduces State Space Models (SSMs) with linear complexity into comprehensive image restoration tasks.
VmambaIR achieves state-of-the-art (SOTA) performance with much fewer computational resources and parameters.
arXiv Detail & Related papers (2024-03-18T02:38:55Z) - FeatUp: A Model-Agnostic Framework for Features at Any Resolution [24.4201195336725]
FeatUp is a task- and model-agnostic framework to restore lost spatial information in deep features.
We introduce two variants of FeatUp: one that guides features with high-resolution signal in a single forward pass, and one that fits an implicit model to a single image to reconstruct features at any resolution.
We show that FeatUp significantly outperforms other feature upsampling and image super-resolution approaches in class activation map generation, transfer learning for segmentation and depth prediction, and end-to-end training for semantic segmentation.
arXiv Detail & Related papers (2024-03-15T17:57:06Z) - Unified-Width Adaptive Dynamic Network for All-In-One Image Restoration [50.81374327480445]
We introduce a novel concept positing that intricate image degradation can be represented in terms of elementary degradation.
We propose the Unified-Width Adaptive Dynamic Network (U-WADN), consisting of two pivotal components: a Width Adaptive Backbone (WAB) and a Width Selector (WS)
The proposed U-WADN achieves better performance while simultaneously reducing up to 32.3% of FLOPs and providing approximately 15.7% real-time acceleration.
arXiv Detail & Related papers (2024-01-24T04:25:12Z) - Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks.
We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception.
Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z) - Learning from History: Task-agnostic Model Contrastive Learning for
Image Restoration [79.04007257606862]
This paper introduces an innovative method termed 'learning from history', which dynamically generates negative samples from the target model itself.
Our approach, named Model Contrastive Learning for Image Restoration (MCLIR), rejuvenates latency models as negative models, making it compatible with diverse image restoration tasks.
arXiv Detail & Related papers (2023-09-12T07:50:54Z) - DRM-IR: Task-Adaptive Deep Unfolding Network for All-In-One Image
Restoration [5.573836220587265]
This work proposes an efficient Dynamic Reference Modeling paradigm (DRM-IR)
DRM-IR consists of task-adaptive degradation modeling and model-based image restoring.
Experiments on multiple benchmark datasets show that our DRM-IR achieves state-of-the-art in All-In-One IR.
arXiv Detail & Related papers (2023-07-15T02:42:19Z) - Super-resolution Reconstruction of Single Image for Latent features [8.857209365343646]
Single-image super-resolution (SISR) typically focuses on restoring various degraded low-resolution (LR) images to a single high-resolution (HR) image.
It is often challenging for models to simultaneously maintain high quality and rapid sampling while preserving diversity in details and texture features.
This challenge can lead to issues such as model collapse, lack of rich details and texture features in the reconstructed HR images, and excessive time consumption for model sampling.
arXiv Detail & Related papers (2022-11-16T09:37:07Z) - Accurate and Lightweight Image Super-Resolution with Model-Guided Deep
Unfolding Network [63.69237156340457]
We present and advocate an explainable approach toward SISR named model-guided deep unfolding network (MoG-DUN)
MoG-DUN is accurate (producing fewer aliasing artifacts), computationally efficient (with reduced model parameters), and versatile (capable of handling multiple degradations)
The superiority of the proposed MoG-DUN method to existing state-of-theart image methods including RCAN, SRDNF, and SRFBN is substantiated by extensive experiments on several popular datasets and various degradation scenarios.
arXiv Detail & Related papers (2020-09-14T08:23:37Z) - Gated Fusion Network for Degraded Image Super Resolution [78.67168802945069]
We propose a dual-branch convolutional neural network to extract base features and recovered features separately.
By decomposing the feature extraction step into two task-independent streams, the dual-branch model can facilitate the training process.
arXiv Detail & Related papers (2020-03-02T13:28:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.