Related papers: An Intelligent Agentic System for Complex Image Restoration Problems

An Intelligent Agentic System for Complex Image Restoration Problems

URL: http://arxiv.org/abs/2410.17809v1
Date: Wed, 23 Oct 2024 12:11:26 GMT
Title: An Intelligent Agentic System for Complex Image Restoration Problems
Authors: Kaiwen Zhu, Jinjin Gu, Zhiyuan You, Yu Qiao, Chao Dong,
Abstract summary: AgenticIR mimics the human approach to image processing by following five key stages: Perception, Scheduling, Execution, Reflection, and Rescheduling. We employ large language models (LLMs) and vision-language models (VLMs) that interact via text generation to operate a toolbox of IR models. Experiments demonstrate AgenticIR's potential in handling complex IR tasks, representing a promising path toward achieving general intelligence in visual processing.
Score: 39.93819777300997
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Real-world image restoration (IR) is inherently complex and often requires combining multiple specialized models to address diverse degradations. Inspired by human problem-solving, we propose AgenticIR, an agentic system that mimics the human approach to image processing by following five key stages: Perception, Scheduling, Execution, Reflection, and Rescheduling. AgenticIR leverages large language models (LLMs) and vision-language models (VLMs) that interact via text generation to dynamically operate a toolbox of IR models. We fine-tune VLMs for image quality analysis and employ LLMs for reasoning, guiding the system step by step. To compensate for LLMs' lack of specific IR knowledge and experience, we introduce a self-exploration method, allowing the LLM to observe and summarize restoration results into referenceable documents. Experiments demonstrate AgenticIR's potential in handling complex IR tasks, representing a promising path toward achieving general intelligence in visual processing.

Related papers

Hybrid Agents for Image Restoration [16.534263448775103]
We present HybridAgent, intending to incorporate multiple restoration modes into a unified image restoration model. Fast restoration agent is designed based on a lightweight large language model (LLM) via in-context learning to understand the user prompts. We introduce the mixed distortion removal mode for our HybridAgents, which is crucial but not concerned in previous agent-based works.
arXiv Detail & Related papers (2025-03-13T07:28:33Z)
Multi-Agent Image Restoration [9.614197636859435]
We propose MAIR, a novel Multi-Agent approach for complex IR problems. Built upon a three-stage restoration framework, MAIR emulates a team of collaborative human specialists. MAIR achieves competitive performance and improved efficiency over the previous agentic IR system.
arXiv Detail & Related papers (2025-03-12T13:53:57Z)
Scaling Autonomous Agents via Automatic Reward Modeling And Planning [52.39395405893965]
Large language models (LLMs) have demonstrated remarkable capabilities across a range of tasks. However, they still struggle with problems requiring multi-step decision-making and environmental feedback. We propose a framework that can automatically learn a reward model from the environment without human annotations.
arXiv Detail & Related papers (2025-02-17T18:49:25Z)
Compositional Image Retrieval via Instruction-Aware Contrastive Learning [40.54022628032561]
Composed Image Retrieval (CIR) involves retrieving a target image based on a composed query of an image paired with text that specifies modifications or changes to the visual reference. In practice, due to the scarcity of annotated data in downstream tasks, Zero-Shot CIR (ZS-CIR) is desirable. We propose a novel embedding method utilizing an instruction-tuned Multimodal LLM (MLLM) to generate composed representation.
arXiv Detail & Related papers (2024-12-07T22:46:52Z)
From Pixels to Prose: Advancing Multi-Modal Language Models for Remote Sensing [16.755590790629153]
This review examines the development and application of multi-modal language models (MLLMs) in remote sensing. We focus on their ability to interpret and describe satellite imagery using natural language. Key applications such as scene description, object detection, change detection, text-to-image retrieval, image-to-text generation, and visual question answering are discussed.
arXiv Detail & Related papers (2024-11-05T12:14:22Z)
LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration [62.3751291442432]
We propose LoRA-IR, a flexible framework that dynamically leverages compact low-rank experts to facilitate efficient all-in-one image restoration. LoRA-IR consists of two training stages: degradation-guided pre-training and parameter-efficient fine-tuning. Experiments demonstrate that LoRA-IR achieves SOTA performance across 14 IR tasks and 29 benchmarks, while maintaining computational efficiency.
arXiv Detail & Related papers (2024-10-20T13:00:24Z)
RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks. Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs. In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z)
Self-Retrieval: End-to-End Information Retrieval with One Large Language Model [97.71181484082663]
We introduce Self-Retrieval, a novel end-to-end LLM-driven information retrieval architecture. Self-Retrieval internalizes the retrieval corpus through self-supervised learning, transforms the retrieval process into sequential passage generation, and performs relevance assessment for reranking.
arXiv Detail & Related papers (2024-02-23T18:45:35Z)
LLMRA: Multi-modal Large Language Model based Restoration Assistant [25.534022968675337]
We present a simple MLLM-based Image Restoration framework to address this gap. We exploit the impressive capabilities of MLLMs to obtain the degradation information for universal image restoration. Our method leverages image degradation priors from MLLMs, providing low-level attributes descriptions of the input low-quality images and the restored high-quality images simultaneously.
arXiv Detail & Related papers (2024-01-21T04:50:19Z)
Multimodal Prompt Perceiver: Empower Adaptiveness, Generalizability and Fidelity for All-in-One Image Restoration [58.11518043688793]
MPerceiver is a novel approach to enhance adaptiveness, generalizability and fidelity for all-in-one image restoration. MPerceiver is trained on 9 tasks for all-in-one IR and outperforms state-of-the-art task-specific methods across most tasks.
arXiv Detail & Related papers (2023-12-05T17:47:11Z)
Vision-by-Language for Training-Free Compositional Image Retrieval [78.60509831598745]
Compositional Image Retrieval (CIR) aims to retrieve the relevant target image in a database. Recent research sidesteps this need by using large-scale vision-language models (VLMs) We propose to tackle CIR in a training-free manner via Vision-by-Language (CIReVL)
arXiv Detail & Related papers (2023-10-13T17:59:38Z)
LMEye: An Interactive Perception Network for Large Language Models [43.160353427015025]
LMEye is a human-like eye with a play-and-plug interactive perception network. It enables dynamic interaction between Large Language Models and external vision information. It significantly improves the zero-shot performance on various multimodal tasks.
arXiv Detail & Related papers (2023-05-05T17:27:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.