An Intelligent Agentic System for Complex Image Restoration Problems
- URL: http://arxiv.org/abs/2410.17809v1
- Date: Wed, 23 Oct 2024 12:11:26 GMT
- Title: An Intelligent Agentic System for Complex Image Restoration Problems
- Authors: Kaiwen Zhu, Jinjin Gu, Zhiyuan You, Yu Qiao, Chao Dong,
- Abstract summary: AgenticIR mimics the human approach to image processing by following five key stages: Perception, Scheduling, Execution, Reflection, and Rescheduling.
We employ large language models (LLMs) and vision-language models (VLMs) that interact via text generation to operate a toolbox of IR models.
Experiments demonstrate AgenticIR's potential in handling complex IR tasks, representing a promising path toward achieving general intelligence in visual processing.
- Score: 39.93819777300997
- License:
- Abstract: Real-world image restoration (IR) is inherently complex and often requires combining multiple specialized models to address diverse degradations. Inspired by human problem-solving, we propose AgenticIR, an agentic system that mimics the human approach to image processing by following five key stages: Perception, Scheduling, Execution, Reflection, and Rescheduling. AgenticIR leverages large language models (LLMs) and vision-language models (VLMs) that interact via text generation to dynamically operate a toolbox of IR models. We fine-tune VLMs for image quality analysis and employ LLMs for reasoning, guiding the system step by step. To compensate for LLMs' lack of specific IR knowledge and experience, we introduce a self-exploration method, allowing the LLM to observe and summarize restoration results into referenceable documents. Experiments demonstrate AgenticIR's potential in handling complex IR tasks, representing a promising path toward achieving general intelligence in visual processing.
Related papers
- LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration [62.3751291442432]
We propose LoRA-IR, a flexible framework that dynamically leverages compact low-rank experts to facilitate efficient all-in-one image restoration.
LoRA-IR consists of two training stages: degradation-guided pre-training and parameter-efficient fine-tuning.
LoRA-IR achieves state-of-the-art performance across 14 image restoration tasks and 29 benchmarks.
arXiv Detail & Related papers (2024-10-20T13:00:24Z) - RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks.
Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs.
In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z) - IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues [10.280113107290067]
The IM-RAG approach integrates Information Retrieval systems with Large Language Models (LLMs) to support multi-round RAG.
The entire IM process is optimized via Reinforcement Learning (RL) where a Progress Tracker is incorporated to provide mid-step rewards.
The results show that our approach achieves state-of-the-art (SOTA) performance while providing high flexibility in integrating IR modules.
arXiv Detail & Related papers (2024-05-15T12:41:20Z) - Self-Retrieval: End-to-End Information Retrieval with One Large Language Model [97.71181484082663]
We introduce Self-Retrieval, a novel end-to-end LLM-driven information retrieval architecture.
Self-Retrieval internalizes the retrieval corpus through self-supervised learning, transforms the retrieval process into sequential passage generation, and performs relevance assessment for reranking.
arXiv Detail & Related papers (2024-02-23T18:45:35Z) - LLMRA: Multi-modal Large Language Model based Restoration Assistant [25.534022968675337]
We present a simple MLLM-based Image Restoration framework to address this gap.
We exploit the impressive capabilities of MLLMs to obtain the degradation information for universal image restoration.
Our method leverages image degradation priors from MLLMs, providing low-level attributes descriptions of the input low-quality images and the restored high-quality images simultaneously.
arXiv Detail & Related papers (2024-01-21T04:50:19Z) - Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration [58.11518043688793]
MPerceiver is a novel approach to enhance adaptiveness, generalizability and fidelity for all-in-one image restoration.
MPerceiver is trained on 9 tasks for all-in-one IR and outperforms state-of-the-art task-specific methods across most tasks.
arXiv Detail & Related papers (2023-12-05T17:47:11Z) - Vision-by-Language for Training-Free Compositional Image Retrieval [78.60509831598745]
Compositional Image Retrieval (CIR) aims to retrieve the relevant target image in a database.
Recent research sidesteps this need by using large-scale vision-language models (VLMs)
We propose to tackle CIR in a training-free manner via Vision-by-Language (CIReVL)
arXiv Detail & Related papers (2023-10-13T17:59:38Z) - LMEye: An Interactive Perception Network for Large Language Models [43.160353427015025]
LMEye is a human-like eye with a play-and-plug interactive perception network.
It enables dynamic interaction between Large Language Models and external vision information.
It significantly improves the zero-shot performance on various multimodal tasks.
arXiv Detail & Related papers (2023-05-05T17:27:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.