Related papers: Visual-Language Model Knowledge Distillation Method for Image Quality Assessment

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment

URL: http://arxiv.org/abs/2507.15680v3
Date: Wed, 23 Jul 2025 08:20:34 GMT
Title: Visual-Language Model Knowledge Distillation Method for Image Quality Assessment
Authors: Yongkang Hou, Jiarun Song,
Abstract summary: Multimodal methods based on vision-language models, such as CLIP, have demonstrated exceptional generalization capabilities in IQA tasks.<n>This study proposes a visual-language model knowledge distillation method aimed at guiding the training of models with architectural advantages using CLIP's IQA knowledge.
Score: 0.9821874476902972
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Image Quality Assessment (IQA) is a core task in computer vision. Multimodal methods based on vision-language models, such as CLIP, have demonstrated exceptional generalization capabilities in IQA tasks. To address the issues of excessive parameter burden and insufficient ability to identify local distorted features in CLIP for IQA, this study proposes a visual-language model knowledge distillation method aimed at guiding the training of models with architectural advantages using CLIP's IQA knowledge. First, quality-graded prompt templates were designed to guide CLIP to output quality scores. Then, CLIP is fine-tuned to enhance its capabilities in IQA tasks. Finally, a modality-adaptive knowledge distillation strategy is proposed to achieve guidance from the CLIP teacher model to the student model. Our experiments were conducted on multiple IQA datasets, and the results show that the proposed method significantly reduces model complexity while outperforming existing IQA methods, demonstrating strong potential for practical deployment.

Related papers

Enhancing Image Quality Assessment Ability of LMMs via Retrieval-Augmented Generation [102.10193318526137]
Large Multimodal Models (LMMs) have recently shown remarkable promise in low-level visual perception tasks.<n>We introduce IQARAG, a training-free framework that enhances LMMs' Image Quality Assessment (IQA) ability.<n>IQARAG leverages Retrieval-Augmented Generation (RAG) to retrieve some semantically similar but quality-variant reference images with corresponding Mean Opinion Scores (MOSs) for input image.
arXiv Detail & Related papers (2026-01-13T08:00:02Z)
Zoom-IQA: Image Quality Assessment with Reliable Region-Aware Reasoning [32.30800226412995]
We introduce Zoom-IQA, a VLM-based IQA model to explicitly emulate key cognitive behaviors.<n>We show that Zoom-IQA achieves improved robustness, explainability, and generalization.<n>The application to downstream tasks, such as image restoration, further demonstrates the effectiveness of Zoom-IQA.
arXiv Detail & Related papers (2026-01-06T11:00:17Z)
Life-IQA: Boosting Blind Image Quality Assessment through GCN-enhanced Layer Interaction and MoE-based Feature Decoupling [53.74410422225995]
Blind image quality assessment (BIQA) plays a crucial role in evaluating and optimizing visual experience.<n>Most existing BIQA approaches fuse shallow and deep features extracted from backbone networks, while overlooking the unequal contributions to quality prediction.<n>This paper investigates the contributions of shallow and deep features to BIQA, and proposes a effective quality feature decoding framework via GCN-enhanced underlinelayerunderlineinteraction and MoE-based underlinefeature dunderlineecoupling, termed textbf(Life-IQA).
arXiv Detail & Related papers (2025-11-24T11:59:55Z)
AgenticIQA: An Agentic Framework for Adaptive and Interpretable Image Quality Assessment [69.06977852423564]
Image quality assessment (IQA) reflects both the quantification and interpretation of perceptual quality rooted in the human visual system.<n>AgenticIQA decomposes IQA into four subtasks -- distortion detection, distortion analysis, tool selection, and tool execution.<n>To support training and evaluation, we introduce AgenticIQA-200K, a large-scale instruction dataset tailored for IQA agents, and AgenticIQA-Eval, the first benchmark for assessing the planning, execution, and summarization capabilities of VLM-based IQA agents.
arXiv Detail & Related papers (2025-09-30T09:37:01Z)
Q-Insight: Understanding Image Quality via Visual Reinforcement Learning [27.26829134776367]
Image quality assessment (IQA) focuses on the perceptual visual quality of images, playing a crucial role in downstream tasks such as image reconstruction, compression, and generation.<n>We propose Q-Insight, a reinforcement learning-based model built upon group relative policy optimization (GRPO)<n>We show that Q-Insight substantially outperforms existing state-of-the-art methods in both score regression and degradation perception tasks.
arXiv Detail & Related papers (2025-03-28T17:59:54Z)
VerifierQ: Enhancing LLM Test Time Compute with Q-Learning-based Verifiers [7.7705926659081275]
VerifierQ is a novel approach that integrates Offline Q-learning into verifier models. We address three key challenges in applying Q-learning to LLMs. Our method enables parallel Q-value computation and improving training efficiency.
arXiv Detail & Related papers (2024-10-10T15:43:55Z)
Boosting CLIP Adaptation for Image Quality Assessment via Meta-Prompt Learning and Gradient Regularization [55.09893295671917]
This paper introduces a novel Gradient-Regulated Meta-Prompt IQA Framework (GRMP-IQA) The GRMP-IQA comprises two key modules: Meta-Prompt Pre-training Module and Quality-Aware Gradient Regularization. Experiments on five standard BIQA datasets demonstrate the superior performance to the state-of-the-art BIQA methods under limited data setting.
arXiv Detail & Related papers (2024-09-09T07:26:21Z)
Sliced Maximal Information Coefficient: A Training-Free Approach for Image Quality Assessment Enhancement [12.628718661568048]
We aim to explore a generalized human visual attention estimation strategy to mimic the process of human quality rating. In particular, we model human attention generation by measuring the statistical dependency between the degraded image and the reference image. Experimental results verify the performance of existing IQA models can be consistently improved when our attention module is incorporated.
arXiv Detail & Related papers (2024-08-19T11:55:32Z)
Vision-Language Consistency Guided Multi-modal Prompt Learning for Blind AI Generated Image Quality Assessment [57.07360640784803]
We propose vision-language consistency guided multi-modal prompt learning for blind image quality assessment (AGIQA) Specifically, we introduce learnable textual and visual prompts in language and vision branches of Contrastive Language-Image Pre-training (CLIP) models. We design a text-to-image alignment quality prediction task, whose learned vision-language consistency knowledge is used to guide the optimization of the above multi-modal prompts.
arXiv Detail & Related papers (2024-06-24T13:45:31Z)
DP-IQA: Utilizing Diffusion Prior for Blind Image Quality Assessment in the Wild [73.6767681305851]
Blind image quality assessment (IQA) in the wild presents significant challenges.<n>Given the difficulty in collecting large-scale training data, leveraging limited data to develop a model with strong generalization remains an open problem.<n>Motivated by the robust image perception capabilities of pre-trained text-to-image (T2I) diffusion models, we propose a novel IQA method, diffusion priors-based IQA.
arXiv Detail & Related papers (2024-05-30T12:32:35Z)
Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly. Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness. Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings. This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z)
Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment [49.36799270585947]
No-reference point cloud quality assessment (NR-PCQA) aims to automatically evaluate the perceptual quality of distorted point clouds without available reference. We propose a novel contrastive pre-training framework tailored for PCQA (CoPA) Our method outperforms the state-of-the-art PCQA methods on popular benchmarks.
arXiv Detail & Related papers (2024-03-15T07:16:07Z)
Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods [61.49061000562676]
We introduce Cluster Learnability (CL) to assess learnability. CL is measured in terms of the performance of a KNN trained to predict labels obtained by clustering the representations with K-means. We find that CL better correlates with in-distribution model performance than other competing recent evaluation schemes.
arXiv Detail & Related papers (2022-06-02T19:05:13Z)
Task-Specific Normalization for Continual Learning of Blind Image Quality Models [105.03239956378465]
We present a simple yet effective continual learning method for blind image quality assessment (BIQA) The key step in our approach is to freeze all convolution filters of a pre-trained deep neural network (DNN) for an explicit promise of stability. We assign each new IQA dataset (i.e., task) a prediction head, and load the corresponding normalization parameters to produce a quality score. The final quality estimate is computed by black a weighted summation of predictions from all heads with a lightweight $K$-means gating mechanism.
arXiv Detail & Related papers (2021-07-28T15:21:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.