Decoupling Perception and Calibration: Label-Efficient Image Quality Assessment Framework
- URL: http://arxiv.org/abs/2601.20689v1
- Date: Wed, 28 Jan 2026 15:15:17 GMT
- Title: Decoupling Perception and Calibration: Label-Efficient Image Quality Assessment Framework
- Authors: Xinyue Li, Zhichao Zhang, Zhiming Xu, Shubo Xu, Xiongkuo Min, Yitong Chen, Guangtao Zhai,
- Abstract summary: LEAF is a Label-Efficient Image Quality Assessment Framework.<n>It distills perceptual quality priors from an MLLM teacher into a lightweight student regressor.<n>Our method significantly reduces the need for human annotations while maintaining strong MOS-aligned correlations.
- Score: 78.58395822978271
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent multimodal large language models (MLLMs) have demonstrated strong capabilities in image quality assessment (IQA) tasks. However, adapting such large-scale models is computationally expensive and still relies on substantial Mean Opinion Score (MOS) annotations. We argue that for MLLM-based IQA, the core bottleneck lies not in the quality perception capacity of MLLMs, but in MOS scale calibration. Therefore, we propose LEAF, a Label-Efficient Image Quality Assessment Framework that distills perceptual quality priors from an MLLM teacher into a lightweight student regressor, enabling MOS calibration with minimal human supervision. Specifically, the teacher conducts dense supervision through point-wise judgments and pair-wise preferences, with an estimate of decision reliability. Guided by these signals, the student learns the teacher's quality perception patterns through joint distillation and is calibrated on a small MOS subset to align with human annotations. Experiments on both user-generated and AI-generated IQA benchmarks demonstrate that our method significantly reduces the need for human annotations while maintaining strong MOS-aligned correlations, making lightweight IQA practical under limited annotation budgets.
Related papers
- RADAR: Revealing Asymmetric Development of Abilities in MLLM Pre-training [59.493415006017635]
Pre-trained Multi-modal Large Language Models (MLLMs) provide a knowledge-rich foundation for post-training.<n>Current evaluation relies on testing after supervised fine-tuning, which introduces laborious additional training and autoregressive decoding costs.<n>We propose RADAR, an efficient ability-centric evaluation framework for Revealing Asymmetric Development of Abilities in MLLM pRe-training.
arXiv Detail & Related papers (2026-02-13T12:56:31Z) - Enhancing Image Quality Assessment Ability of LMMs via Retrieval-Augmented Generation [102.10193318526137]
Large Multimodal Models (LMMs) have recently shown remarkable promise in low-level visual perception tasks.<n>We introduce IQARAG, a training-free framework that enhances LMMs' Image Quality Assessment (IQA) ability.<n>IQARAG leverages Retrieval-Augmented Generation (RAG) to retrieve some semantically similar but quality-variant reference images with corresponding Mean Opinion Scores (MOSs) for input image.
arXiv Detail & Related papers (2026-01-13T08:00:02Z) - Zoom-IQA: Image Quality Assessment with Reliable Region-Aware Reasoning [32.30800226412995]
We introduce Zoom-IQA, a VLM-based IQA model to explicitly emulate key cognitive behaviors.<n>We show that Zoom-IQA achieves improved robustness, explainability, and generalization.<n>The application to downstream tasks, such as image restoration, further demonstrates the effectiveness of Zoom-IQA.
arXiv Detail & Related papers (2026-01-06T11:00:17Z) - Q-Doc: Benchmarking Document Image Quality Assessment Capabilities in Multi-modal Large Language Models [19.598563198222035]
We propose Q-Doc to systematically probing DIQA capabilities of MLLMs at coarse, middle, and fine granularity levels.<n>We show that while MLLMs possess nascent DIQA abilities, they exhibit critical limitations: inconsistent scoring, distortion misidentification, and severity misjudgment.<n>Our work provides a benchmark for DIQA capabilities in MLLMs, revealing pronounced deficiencies in their quality perception and promising pathways for enhancement.
arXiv Detail & Related papers (2025-11-14T15:41:17Z) - Image Quality Assessment for Machines: Paradigm, Large-scale Database, and Models [60.356842878501254]
Machine vision systems (MVS) are intrinsically vulnerable to performance degradation under adverse visual conditions.<n>We propose a machine-centric image quality assessment (MIQA) framework that quantifies the impact of image degradations on MVS performance.
arXiv Detail & Related papers (2025-08-27T13:07:24Z) - Teaching LMMs for Image Quality Scoring and Interpreting [71.1335005098584]
We propose Q-SiT (Quality Scoring and Interpreting joint Teaching), a unified framework that enables image quality scoring and interpreting simultaneously.<n>Q-SiT is the first model capable of simultaneously performing image quality scoring and interpreting tasks, along with its lightweight variant, Q-SiT-mini.<n> Experimental results demonstrate that Q-SiT achieves strong performance in both tasks with superior generalization IQA abilities.
arXiv Detail & Related papers (2025-03-12T09:39:33Z) - Your Weak LLM is Secretly a Strong Teacher for Alignment [19.33906256866585]
Existing alignment frameworks present constraints either in the form of expensive human effort or high computational costs.<n>This paper explores a promising middle ground, where we employ a weak LLM that is significantly less resource-intensive than top-tier models.<n>We show that weak LLMs can provide feedback that rivals or even exceeds that of fully human-annotated data.
arXiv Detail & Related papers (2024-09-13T13:24:52Z) - Few-Shot Image Quality Assessment via Adaptation of Vision-Language Models [93.91086467402323]
Gradient-Regulated Meta-Prompt IQA Framework (GRMP-IQA) designed to efficiently adapt the visual-language pre-trained model, CLIP, to IQA tasks.<n> GRMP-IQA consists of two core modules: (i) Meta-Prompt Pre-training Module and (ii) Quality-Aware Gradient Regularization.
arXiv Detail & Related papers (2024-09-09T07:26:21Z) - Sliced Maximal Information Coefficient: A Training-Free Approach for Image Quality Assessment Enhancement [12.628718661568048]
We aim to explore a generalized human visual attention estimation strategy to mimic the process of human quality rating.
In particular, we model human attention generation by measuring the statistical dependency between the degraded image and the reference image.
Experimental results verify the performance of existing IQA models can be consistently improved when our attention module is incorporated.
arXiv Detail & Related papers (2024-08-19T11:55:32Z) - Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.
Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness.
Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings.
This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z) - Multi-Task Pseudo-Label Learning for Non-Intrusive Speech Quality
Assessment Model [28.32514067707762]
This study proposes a multi-task pseudo-label learning (MPL)-based non-intrusive speech quality assessment model called MTQ-Net.
MPL consists of two stages: obtaining pseudo-label scores from a pretrained model and performing multi-task learning.
The MTQ-Net with the MPL approach exhibits higher overall predictive power compared to other SSL-based speech assessment models.
arXiv Detail & Related papers (2023-08-18T02:36:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.