SimpleCall: A Lightweight Image Restoration Agent in Label-Free Environments with MLLM Perceptual Feedback
- URL: http://arxiv.org/abs/2512.18599v1
- Date: Sun, 21 Dec 2025 05:12:25 GMT
- Title: SimpleCall: A Lightweight Image Restoration Agent in Label-Free Environments with MLLM Perceptual Feedback
- Authors: Jianglin Lu, Yuanwei Wu, Ziyi Zhao, Hongcheng Wang, Felix Jimenez, Abrar Majeedi, Yun Fu,
- Abstract summary: Complex image restoration aims to recover high-quality images from inputs affected by multiple degradations.<n>Recent restoration agents, powered by vision-language models and large language models, offer promising restoration capabilities but suffer from bottlenecks.<n>We propose a policy optimization-based restoration framework that learns an lightweight agent to determine tool-calling sequences.
- Score: 27.198190496709433
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Complex image restoration aims to recover high-quality images from inputs affected by multiple degradations such as blur, noise, rain, and compression artifacts. Recent restoration agents, powered by vision-language models and large language models, offer promising restoration capabilities but suffer from significant efficiency bottlenecks due to reflection, rollback, and iterative tool searching. Moreover, their performance heavily depends on degradation recognition models that require extensive annotations for training, limiting their applicability in label-free environments. To address these limitations, we propose a policy optimization-based restoration framework that learns an lightweight agent to determine tool-calling sequences. The agent operates in a sequential decision process, selecting the most appropriate restoration operation at each step to maximize final image quality. To enable training within label-free environments, we introduce a novel reward mechanism driven by multimodal large language models, which act as human-aligned evaluator and provide perceptual feedback for policy improvement. Once trained, our agent executes a deterministic restoration plans without redundant tool invocations, significantly accelerating inference while maintaining high restoration quality. Extensive experiments show that despite using no supervision, our method matches SOTA performance on full-reference metrics and surpasses existing approaches on no-reference metrics across diverse degradation scenarios.
Related papers
- Learning to Restore Multi-Degraded Images via Ingredient Decoupling and Task-Aware Path Adaptation [51.10017611491389]
Real-world images often suffer from multiple coexisting degradations, such as rain, noise, and haze coexisting in a single image.<n>We propose an adaptive multi-degradation image restoration network that reconstructs images by leveraging decoupled representations of degradation ingredients.<n>The resulting tightly integrated architecture, termed IMDNet, is extensively validated through experiments.
arXiv Detail & Related papers (2025-11-07T01:50:36Z) - MoA-VR: A Mixture-of-Agents System Towards All-in-One Video Restoration [62.929029990341796]
Real-world videos often suffer from complex degradations, such as noise, compression artifacts, and low-light distortions.<n>We propose MoA-VR, which mimics the reasoning and processing procedures of human professionals through three coordinated agents.<n>Specifically, we construct a large-scale and high-resolution video degradation recognition benchmark and build a vision-language model (VLM) driven degradation identifier.
arXiv Detail & Related papers (2025-10-09T17:42:51Z) - Beyond Degradation Redundancy: Contrastive Prompt Learning for All-in-One Image Restoration [109.38288333994407]
Contrastive Prompt Learning (CPL) is a novel framework that fundamentally enhances prompt-task alignment.<n>Our framework establishes new state-of-the-art performance while maintaining parameter efficiency, offering a principled solution for unified image restoration.
arXiv Detail & Related papers (2025-04-14T08:24:57Z) - Navigating Image Restoration with VAR's Distribution Alignment Prior [6.0648320320309885]
VAR, a novel image generative paradigm, surpasses diffusion models in generation quality by applying a next-scale prediction approach.<n>We formulate the multi-scale latent representations within VAR as the restoration prior, thus advancing our delicately designed VarFormer framework.
arXiv Detail & Related papers (2024-12-30T16:32:55Z) - UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity [79.90839080916913]
We present our UniRestorer with improved restoration performance.<n>Specifically, we perform hierarchical clustering on degradation space, and train a multi-granularity mixture-of-experts (MoE) restoration model.<n>In contrast to existing degradation-agnostic and -aware methods, UniRestorer can leverage degradation estimation to benefit degradation specific restoration.
arXiv Detail & Related papers (2024-12-28T14:09:08Z) - Perceive-IR: Learning to Perceive Degradation Better for All-in-One Image Restoration [33.163161549726446]
Perceive-IR is a novel backbone-agnostic All-in-One image restoration framework for fine-grained quality control.<n>Its modular structure allows core components to function independently of specific backbones, enabling seamless integration into advanced restoration models.
arXiv Detail & Related papers (2024-08-28T17:58:54Z) - RestoreAgent: Autonomous Image Restoration Agent via Multimodal Large Language Models [45.88103575837924]
We introduce RestoreAgent, an intelligent image restoration system leveraging multimodal large language models.
RestoreAgent autonomously assesses the type and extent of degradation in input images and performs restoration through (1) determining the appropriate restoration tasks, (2) optimizing the task sequence, (3) selecting the most suitable models, and (4) executing the restoration.
Experimental results demonstrate the superior performance of RestoreAgent in handling complex degradation, surpassing human experts.
arXiv Detail & Related papers (2024-07-25T13:29:37Z) - Diff-Restorer: Unleashing Visual Prompts for Diffusion-based Universal Image Restoration [19.87693298262894]
We propose Diff-Restorer, a universal image restoration method based on the diffusion model.
We utilize the pre-trained visual language model to extract visual prompts from degraded images.
We also design a Degradation-aware Decoder to perform structural correction and convert the latent code to the pixel domain.
arXiv Detail & Related papers (2024-07-04T05:01:10Z) - Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild [57.06779516541574]
SUPIR (Scaling-UP Image Restoration) is a groundbreaking image restoration method that harnesses generative prior and the power of model scaling up.
We collect a dataset comprising 20 million high-resolution, high-quality images for model training, each enriched with descriptive text annotations.
arXiv Detail & Related papers (2024-01-24T17:58:07Z) - SPIRE: Semantic Prompt-Driven Image Restoration [66.26165625929747]
We develop SPIRE, a Semantic and restoration Prompt-driven Image Restoration framework.
Our approach is the first framework that supports fine-level instruction through language-based quantitative specification of the restoration strength.
Our experiments demonstrate the superior restoration performance of SPIRE compared to the state of the arts.
arXiv Detail & Related papers (2023-12-18T17:02:30Z) - Prompt-based Ingredient-Oriented All-in-One Image Restoration [0.0]
We propose a novel data ingredient-oriented approach to tackle multiple image degradation tasks.
Specifically, we utilize a encoder to capture features and introduce prompts with degradation-specific information to guide the decoder.
Our method performs competitively to the state-of-the-art.
arXiv Detail & Related papers (2023-09-06T15:05:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.