Q-Probe: Scaling Image Quality Assessment to High Resolution via Context-Aware Agentic Probing
- URL: http://arxiv.org/abs/2601.15356v2
- Date: Tue, 27 Jan 2026 03:25:24 GMT
- Title: Q-Probe: Scaling Image Quality Assessment to High Resolution via Context-Aware Agentic Probing
- Authors: Xiang Li, XueHeng Li, Yu Wang, XuanHua He, ZhangChi Hu, WeiWei Yu, ChengJun Xie,
- Abstract summary: "Thinking with Images" paradigms induce spurious "cropping-implies-degradation" biases and misinterprets natural depth-of-field as artifacts.<n>We propose Q-Probe, the first agentic IQA framework designed to scale IQA to high resolution via context-aware probing.<n>Q-Probe achieves state-of-the-art performance in high-resolution settings while maintaining superior efficacy across resolution scales.
- Score: 10.754977682331855
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement Learning (RL) has empowered Multimodal Large Language Models (MLLMs) to achieve superior human preference alignment in Image Quality Assessment (IQA). However, existing RL-based IQA models typically rely on coarse-grained global views, failing to capture subtle local degradations in high-resolution scenarios. While emerging "Thinking with Images" paradigms enable multi-scale visual perception via zoom-in mechanisms, their direct adaptation to IQA induces spurious "cropping-implies-degradation" biases and misinterprets natural depth-of-field as artifacts. To address these challenges, we propose Q-Probe, the first agentic IQA framework designed to scale IQA to high resolution via context-aware probing. First, we construct Vista-Bench, a pioneering benchmark tailored for fine-grained local degradation analysis in high-resolution IQA settings. Furthermore, we propose a three-stage training paradigm that progressively aligns the model with human preferences, while simultaneously eliminating causal bias through a novel context-aware cropping strategy. Extensive experiments demonstrate that Q-Probe achieves state-of-the-art performance in high-resolution settings while maintaining superior efficacy across resolution scales.
Related papers
- Zoom-IQA: Image Quality Assessment with Reliable Region-Aware Reasoning [32.30800226412995]
We introduce Zoom-IQA, a VLM-based IQA model to explicitly emulate key cognitive behaviors.<n>We show that Zoom-IQA achieves improved robustness, explainability, and generalization.<n>The application to downstream tasks, such as image restoration, further demonstrates the effectiveness of Zoom-IQA.
arXiv Detail & Related papers (2026-01-06T11:00:17Z) - Image Quality Assessment for Embodied AI [103.66095742463195]
Embodied AI has developed rapidly in recent years, but it is still mainly deployed in laboratories.<n>There is no IQA method to assess the usability of an image in embodied tasks, namely, the perceptual quality for robots.
arXiv Detail & Related papers (2025-05-22T15:51:07Z) - Q-Insight: Understanding Image Quality via Visual Reinforcement Learning [27.26829134776367]
Image quality assessment (IQA) focuses on the perceptual visual quality of images, playing a crucial role in downstream tasks such as image reconstruction, compression, and generation.<n>We propose Q-Insight, a reinforcement learning-based model built upon group relative policy optimization (GRPO)<n>We show that Q-Insight substantially outperforms existing state-of-the-art methods in both score regression and degradation perception tasks.
arXiv Detail & Related papers (2025-03-28T17:59:54Z) - IQPFR: An Image Quality Prior for Blind Face Restoration and Beyond [56.99331967165238]
Blind Face Restoration (BFR) addresses the challenge of reconstructing degraded low-quality (LQ) facial images into high-quality (HQ) outputs.<n>We propose a novel framework that incorporates an Image Quality Prior (IQP) derived from No-Reference Image Quality Assessment (NR-IQA) models.<n>Our method outperforms state-of-the-art techniques across multiple benchmarks.
arXiv Detail & Related papers (2025-03-12T11:39:51Z) - Structural Similarity in Deep Features: Image Quality Assessment Robust to Geometrically Disparate Reference [22.323905448096284]
We propose a unified, non-training-based Deep Structural Similarity (DeepSSIM) approach to address the above problems.<n>The proposed method achieves state-of-the-art performance on AR-IQA datasets and shows strong robustness to various GDR-IQA test cases.
arXiv Detail & Related papers (2024-12-27T09:51:23Z) - Few-Shot Image Quality Assessment via Adaptation of Vision-Language Models [93.91086467402323]
Gradient-Regulated Meta-Prompt IQA Framework (GRMP-IQA) designed to efficiently adapt the visual-language pre-trained model, CLIP, to IQA tasks.<n> GRMP-IQA consists of two core modules: (i) Meta-Prompt Pre-training Module and (ii) Quality-Aware Gradient Regularization.
arXiv Detail & Related papers (2024-09-09T07:26:21Z) - DP-IQA: Utilizing Diffusion Prior for Blind Image Quality Assessment in the Wild [73.6767681305851]
Blind image quality assessment (IQA) in the wild presents significant challenges.<n>Given the difficulty in collecting large-scale training data, leveraging limited data to develop a model with strong generalization remains an open problem.<n>Motivated by the robust image perception capabilities of pre-trained text-to-image (T2I) diffusion models, we propose a novel IQA method, diffusion priors-based IQA.
arXiv Detail & Related papers (2024-05-30T12:32:35Z) - When No-Reference Image Quality Models Meet MAP Estimation in Diffusion Latents [92.45867913876691]
No-reference image quality assessment (NR-IQA) models can effectively quantify perceived image quality.<n>We show that NR-IQA models can be plugged into the maximum a posteriori (MAP) estimation framework for image enhancement.
arXiv Detail & Related papers (2024-03-11T03:35:41Z) - Local Distortion Aware Efficient Transformer Adaptation for Image
Quality Assessment [62.074473976962835]
We show that with proper injection of local distortion features, a larger pretrained and fixed foundation model performs better in IQA tasks.
Specifically, for the lack of local distortion structure and inductive bias of vision transformer (ViT), we use another pretrained convolution neural network (CNN)
We propose a local distortion extractor to obtain local distortion features from the pretrained CNN and a local distortion injector to inject the local distortion features into ViT.
arXiv Detail & Related papers (2023-08-23T08:41:21Z) - Uncertainty-Aware Blind Image Quality Assessment in the Laboratory and
Wild [98.48284827503409]
We develop a textitunified BIQA model and an approach of training it for both synthetic and realistic distortions.
We employ the fidelity loss to optimize a deep neural network for BIQA over a large number of such image pairs.
Experiments on six IQA databases show the promise of the learned method in blindly assessing image quality in the laboratory and wild.
arXiv Detail & Related papers (2020-05-28T13:35:23Z) - Comparison of Image Quality Models for Optimization of Image Processing
Systems [41.57409136781606]
We use eleven full-reference IQA models to train deep neural networks for four low-level vision tasks.
Subjective testing on the optimized images allows us to rank the competing models in terms of their perceptual performance.
arXiv Detail & Related papers (2020-05-04T09:26:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.