Related papers: Revisiting MLLM Based Image Quality Assessment: Errors and Remedy

Revisiting MLLM Based Image Quality Assessment: Errors and Remedy

URL: http://arxiv.org/abs/2511.07812v1
Date: Wed, 12 Nov 2025 01:20:32 GMT
Title: Revisiting MLLM Based Image Quality Assessment: Errors and Remedy
Authors: Zhenchen Tang, Songlin Yang, Bo Peng, Zichuan Wang, Jing Dong,
Abstract summary: A key challenge arises from the inherent mismatch between the discrete token outputs of MLLMs and the continuous nature of quality scores required by IQA tasks.<n>We propose Q-Scorer, which incorporates a lightweight regression module and IQA-specific score tokens into the MLLM pipeline.<n>Q-Scorer achieves state-of-the-art performance across multiple IQA benchmarks, generalizes well to mixed datasets, and further improves when combined with other methods.
Score: 23.918454005000328
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rapid progress of multi-modal large language models (MLLMs) has boosted the task of image quality assessment (IQA). However, a key challenge arises from the inherent mismatch between the discrete token outputs of MLLMs and the continuous nature of quality scores required by IQA tasks. This discrepancy significantly hinders the performance of MLLM-based IQA methods. Previous approaches that convert discrete token predictions into continuous scores often suffer from conversion errors. Moreover, the semantic confusion introduced by level tokens (e.g., ``good'') further constrains the performance of MLLMs on IQA tasks and degrades their original capabilities for related tasks. To tackle these problems, we provide a theoretical analysis of the errors inherent in previous approaches and, motivated by this analysis, propose a simple yet effective framework, Q-Scorer. This framework incorporates a lightweight regression module and IQA-specific score tokens into the MLLM pipeline. Extensive experiments demonstrate that Q-Scorer achieves state-of-the-art performance across multiple IQA benchmarks, generalizes well to mixed datasets, and further improves when combined with other methods.

Related papers

Decoupling Perception and Calibration: Label-Efficient Image Quality Assessment Framework [78.58395822978271]
LEAF is a Label-Efficient Image Quality Assessment Framework.<n>It distills perceptual quality priors from an MLLM teacher into a lightweight student regressor.<n>Our method significantly reduces the need for human annotations while maintaining strong MOS-aligned correlations.
arXiv Detail & Related papers (2026-01-28T15:15:17Z)
Enhancing Image Quality Assessment Ability of LMMs via Retrieval-Augmented Generation [102.10193318526137]
Large Multimodal Models (LMMs) have recently shown remarkable promise in low-level visual perception tasks.<n>We introduce IQARAG, a training-free framework that enhances LMMs' Image Quality Assessment (IQA) ability.<n>IQARAG leverages Retrieval-Augmented Generation (RAG) to retrieve some semantically similar but quality-variant reference images with corresponding Mean Opinion Scores (MOSs) for input image.
arXiv Detail & Related papers (2026-01-13T08:00:02Z)
ALARM: Automated MLLM-Based Anomaly Detection in Complex-EnviRonment Monitoring with Uncertainty Quantification [16.05388703860442]
In this paper, we introduce our UQ-supported MLLM-based visual anomaly detection framework called ALARM.<n>AlARM integrates quality-assurance techniques like reasoning chain, self-reflection, and MLLM ensemble for robust and accurate performance.<n> Extensive empirical evaluations are conducted using the real-world smart-home benchmark data and wound image classification data, which shows ALARM's superior performance and its generic applicability across different domains for reliable decision-making.
arXiv Detail & Related papers (2025-12-01T19:03:14Z)
Q-Doc: Benchmarking Document Image Quality Assessment Capabilities in Multi-modal Large Language Models [19.598563198222035]
We propose Q-Doc to systematically probing DIQA capabilities of MLLMs at coarse, middle, and fine granularity levels.<n>We show that while MLLMs possess nascent DIQA abilities, they exhibit critical limitations: inconsistent scoring, distortion misidentification, and severity misjudgment.<n>Our work provides a benchmark for DIQA capabilities in MLLMs, revealing pronounced deficiencies in their quality perception and promising pathways for enhancement.
arXiv Detail & Related papers (2025-11-14T15:41:17Z)
AgenticIQA: An Agentic Framework for Adaptive and Interpretable Image Quality Assessment [69.06977852423564]
Image quality assessment (IQA) reflects both the quantification and interpretation of perceptual quality rooted in the human visual system.<n>AgenticIQA decomposes IQA into four subtasks -- distortion detection, distortion analysis, tool selection, and tool execution.<n>To support training and evaluation, we introduce AgenticIQA-200K, a large-scale instruction dataset tailored for IQA agents, and AgenticIQA-Eval, the first benchmark for assessing the planning, execution, and summarization capabilities of VLM-based IQA agents.
arXiv Detail & Related papers (2025-09-30T09:37:01Z)
Q-Insight: Understanding Image Quality via Visual Reinforcement Learning [27.26829134776367]
Image quality assessment (IQA) focuses on the perceptual visual quality of images, playing a crucial role in downstream tasks such as image reconstruction, compression, and generation.<n>We propose Q-Insight, a reinforcement learning-based model built upon group relative policy optimization (GRPO)<n>We show that Q-Insight substantially outperforms existing state-of-the-art methods in both score regression and degradation perception tasks.
arXiv Detail & Related papers (2025-03-28T17:59:54Z)
LLM-based Discriminative Reasoning for Knowledge Graph Question Answering [42.277864969014296]
Large language models (LLMs) based on generative pre-trained Transformer have achieved remarkable performance on knowledge graph question-answering (KGQA) tasks.<n>However, LLMs often produce ungrounded subgraph planning or reasoning results in KGQA due to the hallucinatory behavior brought by the generative paradigm.<n>We propose READS to reformulate the KGQA process into discriminative subtasks, which simplifies the search space for each subtask.
arXiv Detail & Related papers (2024-12-17T08:07:16Z)
VerifierQ: Enhancing LLM Test Time Compute with Q-Learning-based Verifiers [7.7705926659081275]
VerifierQ is a novel approach that integrates Offline Q-learning into verifier models. We address three key challenges in applying Q-learning to LLMs. Our method enables parallel Q-value computation and improving training efficiency.
arXiv Detail & Related papers (2024-10-10T15:43:55Z)
Few-Shot Image Quality Assessment via Adaptation of Vision-Language Models [93.91086467402323]
Gradient-Regulated Meta-Prompt IQA Framework (GRMP-IQA) designed to efficiently adapt the visual-language pre-trained model, CLIP, to IQA tasks.<n> GRMP-IQA consists of two core modules: (i) Meta-Prompt Pre-training Module and (ii) Quality-Aware Gradient Regularization.
arXiv Detail & Related papers (2024-09-09T07:26:21Z)
Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph [83.90988015005934]
Uncertainty quantification is a key element of machine learning applications.<n>We introduce a novel benchmark that implements a collection of state-of-the-art UQ baselines.<n>We conduct a large-scale empirical investigation of UQ and normalization techniques across eleven tasks, identifying the most effective approaches.
arXiv Detail & Related papers (2024-06-21T20:06:31Z)
Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score. Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score. Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z)
KECP: Knowledge Enhanced Contrastive Prompting for Few-shot Extractive Question Answering [28.18555591429343]
We propose a novel framework named Knowledge Enhanced Contrastive Prompt-tuning (KECP) Instead of adding pointer heads to PLMs, we transform the task into a non-autoregressive Masked Language Modeling (MLM) generation problem. Our method consistently outperforms state-of-the-art approaches in few-shot settings by a large margin.
arXiv Detail & Related papers (2022-05-06T08:31:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.