RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models
- URL: http://arxiv.org/abs/2407.18035v1
- Date: Thu, 25 Jul 2024 13:29:37 GMT
- Title: RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models
- Authors: Haoyu Chen, Wenbo Li, Jinjin Gu, Jingjing Ren, Sixiang Chen, Tian Ye, Renjing Pei, Kaiwen Zhou, Fenglong Song, Lei Zhu,
- Abstract summary: We introduce RestoreAgent, an intelligent image restoration system leveraging multimodal large language models.
RestoreAgent autonomously assesses the type and extent of degradation in input images and performs restoration through (1) determining the appropriate restoration tasks, (2) optimizing the task sequence, (3) selecting the most suitable models, and (4) executing the restoration.
Experimental results demonstrate the superior performance of RestoreAgent in handling complex degradation, surpassing human experts.
- Score: 45.88103575837924
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Natural images captured by mobile devices often suffer from multiple types of degradation, such as noise, blur, and low light. Traditional image restoration methods require manual selection of specific tasks, algorithms, and execution sequences, which is time-consuming and may yield suboptimal results. All-in-one models, though capable of handling multiple tasks, typically support only a limited range and often produce overly smooth, low-fidelity outcomes due to their broad data distribution fitting. To address these challenges, we first define a new pipeline for restoring images with multiple degradations, and then introduce RestoreAgent, an intelligent image restoration system leveraging multimodal large language models. RestoreAgent autonomously assesses the type and extent of degradation in input images and performs restoration through (1) determining the appropriate restoration tasks, (2) optimizing the task sequence, (3) selecting the most suitable models, and (4) executing the restoration. Experimental results demonstrate the superior performance of RestoreAgent in handling complex degradation, surpassing human experts. Furthermore, the system modular design facilitates the fast integration of new tasks and models, enhancing its flexibility and scalability for various applications.
Related papers
- Training-Free Large Model Priors for Multiple-in-One Image Restoration [24.230376300759573]
Large Model Driven Image Restoration framework (LMDIR)
Our architecture comprises a query-based prompt encoder, degradation-aware transformer block injecting global degradation knowledge.
This design facilitates single-stage training paradigm to address various degradations while supporting both automatic and user-guided restoration.
arXiv Detail & Related papers (2024-07-18T05:40:32Z) - Diff-Restorer: Unleashing Visual Prompts for Diffusion-based Universal Image Restoration [19.87693298262894]
We propose Diff-Restorer, a universal image restoration method based on the diffusion model.
We utilize the pre-trained visual language model to extract visual prompts from degraded images.
We also design a Degradation-aware Decoder to perform structural correction and convert the latent code to the pixel domain.
arXiv Detail & Related papers (2024-07-04T05:01:10Z) - Photo-Realistic Image Restoration in the Wild with Controlled Vision-Language Models [14.25759541950917]
This work leverages a capable vision-language model and a synthetic degradation pipeline to learn image restoration in the wild (wild IR)
Our base diffusion model is the image restoration SDE (IR-SDE)
arXiv Detail & Related papers (2024-04-15T12:34:21Z) - Unified-Width Adaptive Dynamic Network for All-In-One Image Restoration [50.81374327480445]
We introduce a novel concept positing that intricate image degradation can be represented in terms of elementary degradation.
We propose the Unified-Width Adaptive Dynamic Network (U-WADN), consisting of two pivotal components: a Width Adaptive Backbone (WAB) and a Width Selector (WS)
The proposed U-WADN achieves better performance while simultaneously reducing up to 32.3% of FLOPs and providing approximately 15.7% real-time acceleration.
arXiv Detail & Related papers (2024-01-24T04:25:12Z) - SPIRE: Semantic Prompt-Driven Image Restoration [66.26165625929747]
We develop SPIRE, a Semantic and restoration Prompt-driven Image Restoration framework.
Our approach is the first framework that supports fine-level instruction through language-based quantitative specification of the restoration strength.
Our experiments demonstrate the superior restoration performance of SPIRE compared to the state of the arts.
arXiv Detail & Related papers (2023-12-18T17:02:30Z) - Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration [58.11518043688793]
MPerceiver is a novel approach to enhance adaptiveness, generalizability and fidelity for all-in-one image restoration.
MPerceiver is trained on 9 tasks for all-in-one IR and outperforms state-of-the-art task-specific methods across most tasks.
arXiv Detail & Related papers (2023-12-05T17:47:11Z) - Multi-task Image Restoration Guided By Robust DINO Features [98.7455921708419]
We introduce mboxtextbfDINO-IR, a novel multi-task image restoration approach leveraging robust features extracted from DINOv2.
Our empirical analysis shows that while shallow features of DINOv2 capture rich low-level image characteristics, the deep features ensure a robust semantic representation insensitive to degradations.
arXiv Detail & Related papers (2023-12-04T06:59:55Z) - Prompt-based Ingredient-Oriented All-in-One Image Restoration [0.0]
We propose a novel data ingredient-oriented approach to tackle multiple image degradation tasks.
Specifically, we utilize a encoder to capture features and introduce prompts with degradation-specific information to guide the decoder.
Our method performs competitively to the state-of-the-art.
arXiv Detail & Related papers (2023-09-06T15:05:04Z) - Gated Multi-Resolution Transfer Network for Burst Restoration and
Enhancement [75.25451566988565]
We propose a novel Gated Multi-Resolution Transfer Network (GMTNet) to reconstruct a spatially precise high-quality image from a burst of low-quality raw images.
Detailed experimental analysis on five datasets validates our approach and sets a state-of-the-art for burst super-resolution, burst denoising, and low-light burst enhancement.
arXiv Detail & Related papers (2023-04-13T17:54:00Z) - Super-resolution Reconstruction of Single Image for Latent features [8.857209365343646]
Single-image super-resolution (SISR) typically focuses on restoring various degraded low-resolution (LR) images to a single high-resolution (HR) image.
It is often challenging for models to simultaneously maintain high quality and rapid sampling while preserving diversity in details and texture features.
This challenge can lead to issues such as model collapse, lack of rich details and texture features in the reconstructed HR images, and excessive time consumption for model sampling.
arXiv Detail & Related papers (2022-11-16T09:37:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.