Plug In, Grade Right: Psychology-Inspired AGIQA
- URL: http://arxiv.org/abs/2512.22780v1
- Date: Sun, 28 Dec 2025 04:51:05 GMT
- Title: Plug In, Grade Right: Psychology-Inspired AGIQA
- Authors: Zhicheng Liao, Baoliang Chen, Hanwei Zhu, Lingyu Zhu, Shiqi Wang, Weisi Lin,
- Abstract summary: Existing AGIQA models estimate image quality by measuring and aggregating similarities between image embeddings and text embeddings.<n>We propose an improved Graded Response Model (GRM) for AGIQA.<n>Our Arithmetic GRM based Quality Grading (AGQG) module enjoys a plug-and-play advantage.
- Score: 60.23968344837525
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing AGIQA models typically estimate image quality by measuring and aggregating the similarities between image embeddings and text embeddings derived from multi-grade quality descriptions. Although effective, we observe that such similarity distributions across grades usually exhibit multimodal patterns. For instance, an image embedding may show high similarity to both "excellent" and "poor" grade descriptions while deviating from the "good" one. We refer to this phenomenon as "semantic drift", where semantic inconsistencies between text embeddings and their intended descriptions undermine the reliability of text-image shared-space learning. To mitigate this issue, we draw inspiration from psychometrics and propose an improved Graded Response Model (GRM) for AGIQA. The GRM is a classical assessment model that categorizes a subject's ability across grades using test items with various difficulty levels. This paradigm aligns remarkably well with human quality rating, where image quality can be interpreted as an image's ability to meet various quality grades. Building on this philosophy, we design a two-branch quality grading module: one branch estimates image ability while the other constructs multiple difficulty levels. To ensure monotonicity in difficulty levels, we further model difficulty generation in an arithmetic manner, which inherently enforces a unimodal and interpretable quality distribution. Our Arithmetic GRM based Quality Grading (AGQG) module enjoys a plug-and-play advantage, consistently improving performance when integrated into various state-of-the-art AGIQA frameworks. Moreover, it also generalizes effectively to both natural and screen content image quality assessment, revealing its potential as a key component in future IQA models.
Related papers
- VisualQuality-R1: Reasoning-Induced Image Quality Assessment via Reinforcement Learning to Rank [30.316630325648834]
We introduce VisualQuality-R1, a reasoning-induced no-reference IQA (NR-IQA) model, and we train it with reinforcement learning to rank.<n>We show that VisualQuality-R1 consistently outperforms discriminative deep learning-based NR-IQA models.<n>VisualQuality-R1 is capable of generating contextually rich, human-aligned quality descriptions.
arXiv Detail & Related papers (2025-05-20T14:56:50Z) - Teaching LMMs for Image Quality Scoring and Interpreting [71.1335005098584]
We propose Q-SiT (Quality Scoring and Interpreting joint Teaching), a unified framework that enables image quality scoring and interpreting simultaneously.<n>Q-SiT is the first model capable of simultaneously performing image quality scoring and interpreting tasks, along with its lightweight variant, Q-SiT-mini.<n> Experimental results demonstrate that Q-SiT achieves strong performance in both tasks with superior generalization IQA abilities.
arXiv Detail & Related papers (2025-03-12T09:39:33Z) - IQA-Adapter: Exploring Knowledge Transfer from Image Quality Assessment to Diffusion-based Generative Models [0.5356944479760104]
We propose methods to integrate image quality assessment (IQA) models into diffusion-based generators.<n>We show that diffusion models can learn complex qualitative relationships from both IQA models' outputs and internal activations.<n>We introduce IQA-Adapter, a novel framework that conditions generation on target quality levels by learning the implicit relationship between images and quality scores.
arXiv Detail & Related papers (2024-12-02T18:40:19Z) - Q-Ground: Image Quality Grounding with Large Multi-modality Models [61.72022069880346]
We introduce Q-Ground, the first framework aimed at tackling fine-scale visual quality grounding.
Q-Ground combines large multi-modality models with detailed visual quality analysis.
Central to our contribution is the introduction of the QGround-100K dataset.
arXiv Detail & Related papers (2024-07-24T06:42:46Z) - Vision-Language Consistency Guided Multi-modal Prompt Learning for Blind AI Generated Image Quality Assessment [57.07360640784803]
We propose vision-language consistency guided multi-modal prompt learning for blind image quality assessment (AGIQA)
Specifically, we introduce learnable textual and visual prompts in language and vision branches of Contrastive Language-Image Pre-training (CLIP) models.
We design a text-to-image alignment quality prediction task, whose learned vision-language consistency knowledge is used to guide the optimization of the above multi-modal prompts.
arXiv Detail & Related papers (2024-06-24T13:45:31Z) - DP-IQA: Utilizing Diffusion Prior for Blind Image Quality Assessment in the Wild [73.6767681305851]
Blind image quality assessment (IQA) in the wild presents significant challenges.<n>Given the difficulty in collecting large-scale training data, leveraging limited data to develop a model with strong generalization remains an open problem.<n>Motivated by the robust image perception capabilities of pre-trained text-to-image (T2I) diffusion models, we propose a novel IQA method, diffusion priors-based IQA.
arXiv Detail & Related papers (2024-05-30T12:32:35Z) - Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.
Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness.
Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings.
This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z) - Pairwise Comparisons Are All You Need [22.798716660911833]
Blind image quality assessment (BIQA) approaches often fall short in real-world scenarios due to their reliance on a generic quality standard applied uniformly across diverse images.
This paper introduces PICNIQ, a pairwise comparison framework designed to bypass the limitations of conventional BIQA.
By employing psychometric scaling algorithms, PICNIQ transforms pairwise comparisons into just-objectionable-difference (JOD) quality scores, offering a granular and interpretable measure of image quality.
arXiv Detail & Related papers (2024-03-13T23:43:36Z) - HiCD: Change Detection in Quality-Varied Images via Hierarchical
Correlation Distillation [40.03785896317387]
We introduce an innovative training strategy grounded in knowledge distillation.
The core idea revolves around leveraging task knowledge acquired from high-quality image pairs to guide the model's learning.
We develop a hierarchical correlation distillation approach (involving self-correlation, cross-correlation, and global correlation)
arXiv Detail & Related papers (2024-01-19T15:21:51Z) - Blind Image Quality Assessment via Vision-Language Correspondence: A
Multitask Learning Perspective [93.56647950778357]
Blind image quality assessment (BIQA) predicts the human perception of image quality without any reference information.
We develop a general and automated multitask learning scheme for BIQA to exploit auxiliary knowledge from other tasks.
arXiv Detail & Related papers (2023-03-27T07:58:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.