Related papers: Zoom-IQA: Image Quality Assessment with Reliable Region-Aware Reasoning

Zoom-IQA: Image Quality Assessment with Reliable Region-Aware Reasoning

URL: http://arxiv.org/abs/2601.02918v2
Date: Thu, 15 Jan 2026 14:19:47 GMT
Title: Zoom-IQA: Image Quality Assessment with Reliable Region-Aware Reasoning
Authors: Guoqiang Liang, Jianyi Wang, Zhonghua Wu, Shangchen Zhou,
Abstract summary: We introduce Zoom-IQA, a VLM-based IQA model to explicitly emulate key cognitive behaviors.<n>We show that Zoom-IQA achieves improved robustness, explainability, and generalization.<n>The application to downstream tasks, such as image restoration, further demonstrates the effectiveness of Zoom-IQA.
Score: 32.30800226412995
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Image Quality Assessment (IQA) is a long-standing problem in computer vision. Previous methods typically focus on predicting numerical scores without explanation or providing low-level descriptions lacking precise scores. Recent reasoning-based vision language models (VLMs) have shown strong potential for IQA by jointly generating quality descriptions and scores. However, existing VLM-based IQA methods often suffer from unreliable reasoning due to their limited capability of integrating visual and textual cues. In this work, we introduce Zoom-IQA, a VLM-based IQA model to explicitly emulate key cognitive behaviors: uncertainty awareness, region reasoning, and iterative refinement. Specifically, we present a two-stage training pipeline: 1) supervised fine-tuning (SFT) on our Grounded-Rationale-IQA (GR-IQA) dataset to teach the model to ground its assessments in key regions, and 2) reinforcement learning (RL) for dynamic policy exploration, stabilized by our KL-Coverage regularizer to prevent reasoning and scoring diversity collapse, with a Progressive Re-sampling Strategy for mitigating annotation bias. Extensive experiments show that Zoom-IQA achieves improved robustness, explainability, and generalization. The application to downstream tasks, such as image restoration, further demonstrates the effectiveness of Zoom-IQA.

Related papers

AgenticIQA: An Agentic Framework for Adaptive and Interpretable Image Quality Assessment [69.06977852423564]
Image quality assessment (IQA) reflects both the quantification and interpretation of perceptual quality rooted in the human visual system.<n>AgenticIQA decomposes IQA into four subtasks -- distortion detection, distortion analysis, tool selection, and tool execution.<n>To support training and evaluation, we introduce AgenticIQA-200K, a large-scale instruction dataset tailored for IQA agents, and AgenticIQA-Eval, the first benchmark for assessing the planning, execution, and summarization capabilities of VLM-based IQA agents.
arXiv Detail & Related papers (2025-09-30T09:37:01Z)
VQAThinker: Exploring Generalizable and Explainable Video Quality Assessment via Reinforcement Learning [50.34205095371895]
Video quality assessment aims to objectively quantify perceptual quality degradation.<n>Existing VQA models suffer from two critical limitations.<n>We propose textbfVQAThinker, a reasoning-based VQA framework.
arXiv Detail & Related papers (2025-08-08T06:16:23Z)
Refine-IQA: Multi-Stage Reinforcement Finetuning for Perceptual Image Quality Assessment [22.184690568393126]
Reinforcement fine-tuning (RFT) is a proliferating paradigm for LMM training.<n>We propose a multi-stage RFT IQA framework (-IQA)<n>The resulting Refine-IQA Series Models achieve outstanding performance on both perception and scoring tasks.
arXiv Detail & Related papers (2025-08-04T22:46:10Z)
TRIQA: Image Quality Assessment by Contrastive Pretraining on Ordered Distortion Triplets [31.2422359004089]
No-Reference (NR) IQA remains particularly challenging due to the absence of a reference image.<n>We propose a novel approach that constructs a custom dataset using a limited number of reference content images.<n>We train a quality-aware model using contrastive triplet-based learning, enabling efficient training with fewer samples.
arXiv Detail & Related papers (2025-07-16T23:43:12Z)
Q-Insight: Understanding Image Quality via Visual Reinforcement Learning [27.26829134776367]
Image quality assessment (IQA) focuses on the perceptual visual quality of images, playing a crucial role in downstream tasks such as image reconstruction, compression, and generation.<n>We propose Q-Insight, a reinforcement learning-based model built upon group relative policy optimization (GRPO)<n>We show that Q-Insight substantially outperforms existing state-of-the-art methods in both score regression and degradation perception tasks.
arXiv Detail & Related papers (2025-03-28T17:59:54Z)
Backdoor Attacks against No-Reference Image Quality Assessment Models via a Scalable Trigger [76.36315347198195]
No-Reference Image Quality Assessment (NR-IQA) plays a critical role in evaluating and optimizing computer vision systems.<n>Recent research indicates that NR-IQA models are susceptible to adversarial attacks.<n>We present a novel poisoning-based backdoor attack against NR-IQA (BAIQA)
arXiv Detail & Related papers (2024-12-10T08:07:19Z)
Few-Shot Image Quality Assessment via Adaptation of Vision-Language Models [93.91086467402323]
Gradient-Regulated Meta-Prompt IQA Framework (GRMP-IQA) designed to efficiently adapt the visual-language pre-trained model, CLIP, to IQA tasks.<n> GRMP-IQA consists of two core modules: (i) Meta-Prompt Pre-training Module and (ii) Quality-Aware Gradient Regularization.
arXiv Detail & Related papers (2024-09-09T07:26:21Z)
DP-IQA: Utilizing Diffusion Prior for Blind Image Quality Assessment in the Wild [73.6767681305851]
Blind image quality assessment (IQA) in the wild presents significant challenges.<n>Given the difficulty in collecting large-scale training data, leveraging limited data to develop a model with strong generalization remains an open problem.<n>Motivated by the robust image perception capabilities of pre-trained text-to-image (T2I) diffusion models, we propose a novel IQA method, diffusion priors-based IQA.
arXiv Detail & Related papers (2024-05-30T12:32:35Z)
Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly. Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness. Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings. This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.