An Intelligent Agentic System for Complex Image Restoration Problems
- URL: http://arxiv.org/abs/2410.17809v1
- Date: Wed, 23 Oct 2024 12:11:26 GMT
- Title: An Intelligent Agentic System for Complex Image Restoration Problems
- Authors: Kaiwen Zhu, Jinjin Gu, Zhiyuan You, Yu Qiao, Chao Dong,
- Abstract summary: AgenticIR mimics the human approach to image processing by following five key stages: Perception, Scheduling, Execution, Reflection, and Rescheduling.
We employ large language models (LLMs) and vision-language models (VLMs) that interact via text generation to operate a toolbox of IR models.
Experiments demonstrate AgenticIR's potential in handling complex IR tasks, representing a promising path toward achieving general intelligence in visual processing.
- Score: 39.93819777300997
- License:
- Abstract: Real-world image restoration (IR) is inherently complex and often requires combining multiple specialized models to address diverse degradations. Inspired by human problem-solving, we propose AgenticIR, an agentic system that mimics the human approach to image processing by following five key stages: Perception, Scheduling, Execution, Reflection, and Rescheduling. AgenticIR leverages large language models (LLMs) and vision-language models (VLMs) that interact via text generation to dynamically operate a toolbox of IR models. We fine-tune VLMs for image quality analysis and employ LLMs for reasoning, guiding the system step by step. To compensate for LLMs' lack of specific IR knowledge and experience, we introduce a self-exploration method, allowing the LLM to observe and summarize restoration results into referenceable documents. Experiments demonstrate AgenticIR's potential in handling complex IR tasks, representing a promising path toward achieving general intelligence in visual processing.
Related papers
- Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks.
However, they still struggle with problems requiring multi-step decision-making and environmental feedback.
We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z) - Reasoning Language Models: A Blueprint [12.966875494760785]
Reasoning language models (RLMs) have redefined AI's problem-solving capabilities.
Yet, their high costs, proprietary nature, and complex architectures present accessibility and scalability challenges.
We propose a comprehensive blueprint that organizes RLM into a modular framework.
arXiv Detail & Related papers (2025-01-20T02:16:19Z) - Compositional Image Retrieval via Instruction-Aware Contrastive Learning [40.54022628032561]
Composed Image Retrieval (CIR) involves retrieving a target image based on a composed query of an image paired with text that specifies modifications or changes to the visual reference.
In practice, due to the scarcity of annotated data in downstream tasks, Zero-Shot CIR (ZS-CIR) is desirable.
We propose a novel embedding method utilizing an instruction-tuned Multimodal LLM (MLLM) to generate composed representation.
arXiv Detail & Related papers (2024-12-07T22:46:52Z) - From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing [16.755590790629153]
This review examines the development and application of multi-modal language models (MLLMs) in remote sensing.
We focus on their ability to interpret and describe satellite imagery using natural language.
Key applications such as scene description, object detection, change detection, text-to-image retrieval, image-to-text generation, and visual question answering are discussed.
arXiv Detail & Related papers (2024-11-05T12:14:22Z) - LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration [62.3751291442432]
We propose LoRA-IR, a flexible framework that dynamically leverages compact low-rank experts to facilitate efficient all-in-one image restoration.
LoRA-IR consists of two training stages: degradation-guided pre-training and parameter-efficient fine-tuning.
Experiments demonstrate that LoRA-IR achieves SOTA performance across 14 IR tasks and 29 benchmarks, while maintaining computational efficiency.
arXiv Detail & Related papers (2024-10-20T13:00:24Z) - RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks.
Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs.
In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z) - Self-Retrieval: End-to-End Information Retrieval with One Large Language Model [97.71181484082663]
We introduce Self-Retrieval, a novel end-to-end LLM-driven information retrieval architecture.
Self-Retrieval internalizes the retrieval corpus through self-supervised learning, transforms the retrieval process into sequential passage generation, and performs relevance assessment for reranking.
arXiv Detail & Related papers (2024-02-23T18:45:35Z) - LLMRA: Multi-modal Large Language Model based Restoration Assistant [25.534022968675337]
We present a simple MLLM-based Image Restoration framework to address this gap.
We exploit the impressive capabilities of MLLMs to obtain the degradation information for universal image restoration.
Our method leverages image degradation priors from MLLMs, providing low-level attributes descriptions of the input low-quality images and the restored high-quality images simultaneously.
arXiv Detail & Related papers (2024-01-21T04:50:19Z) - Vision-by-Language for Training-Free Compositional Image Retrieval [78.60509831598745]
Compositional Image Retrieval (CIR) aims to retrieve the relevant target image in a database.
Recent research sidesteps this need by using large-scale vision-language models (VLMs)
We propose to tackle CIR in a training-free manner via Vision-by-Language (CIReVL)
arXiv Detail & Related papers (2023-10-13T17:59:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.