Teaching LMMs for Image Quality Scoring and Interpreting
- URL: http://arxiv.org/abs/2503.09197v1
- Date: Wed, 12 Mar 2025 09:39:33 GMT
- Title: Teaching LMMs for Image Quality Scoring and Interpreting
- Authors: Zicheng Zhang, Haoning Wu, Ziheng Jia, Weisi Lin, Guangtao Zhai,
- Abstract summary: We propose Q-SiT (Quality Scoring and Interpreting joint Teaching), a unified framework that enables image quality scoring and interpreting simultaneously.<n>Q-SiT is the first model capable of simultaneously performing image quality scoring and interpreting tasks, along with its lightweight variant, Q-SiT-mini.<n> Experimental results demonstrate that Q-SiT achieves strong performance in both tasks with superior generalization IQA abilities.
- Score: 71.1335005098584
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image quality scoring and interpreting are two fundamental components of Image Quality Assessment (IQA). The former quantifies image quality, while the latter enables descriptive question answering about image quality. Traditionally, these two tasks have been addressed independently. However, from the perspective of the Human Visual System (HVS) and the Perception-Decision Integration Model, they are inherently interconnected: interpreting serves as the foundation for scoring, while scoring provides an abstract summary of interpreting. Thus, unifying these capabilities within a single model is both intuitive and logically coherent. In this paper, we propose Q-SiT (Quality Scoring and Interpreting joint Teaching), a unified framework that enables large multimodal models (LMMs) to learn both image quality scoring and interpreting simultaneously. We achieve this by transforming conventional IQA datasets into learnable question-answering datasets and incorporating human-annotated quality interpreting data for training. Furthermore, we introduce an efficient scoring & interpreting balance strategy, which first determines the optimal data mix ratio on lightweight LMMs and then maps this ratio to primary LMMs for fine-tuning adjustment. This strategy not only mitigates task interference and enhances cross-task knowledge transfer but also significantly reduces computational costs compared to direct optimization on full-scale LMMs. With this joint learning framework and corresponding training strategy, we develop Q-SiT, the first model capable of simultaneously performing image quality scoring and interpreting tasks, along with its lightweight variant, Q-SiT-mini. Experimental results demonstrate that Q-SiT achieves strong performance in both tasks with superior generalization IQA abilities.Project page at https://github.com/Q-Future/Q-SiT.
Related papers
- Q-Adapt: Adapting LMM for Visual Quality Assessment with Progressive Instruction Tuning [49.07442840323135]
We propose a new paradigm for perception-oriented instruction tuning, i.e., Q-Adapt.
Our proposed Q-Adapt can achieve a lightweight visual quality evaluator, demonstrating comparable performance.
arXiv Detail & Related papers (2025-04-02T12:02:57Z) - Q-Insight: Understanding Image Quality via Visual Reinforcement Learning [27.26829134776367]
Image quality assessment (IQA) focuses on the perceptual visual quality of images, playing a crucial role in downstream tasks such as image reconstruction, compression, and generation.
We propose Q-Insight, a reinforcement learning-based model built upon group relative policy optimization (GRPO)
We show that Q-Insight substantially outperforms existing state-of-the-art methods in both score regression and degradation perception tasks.
arXiv Detail & Related papers (2025-03-28T17:59:54Z) - Q-Ground: Image Quality Grounding with Large Multi-modality Models [61.72022069880346]
We introduce Q-Ground, the first framework aimed at tackling fine-scale visual quality grounding.
Q-Ground combines large multi-modality models with detailed visual quality analysis.
Central to our contribution is the introduction of the QGround-100K dataset.
arXiv Detail & Related papers (2024-07-24T06:42:46Z) - UniQA: Unified Vision-Language Pre-training for Image Quality and Aesthetic Assessment [23.48816491333345]
Image Quality Assessment (IQA) and Image Aesthetic Assessment (IAA) aim to simulate human subjective perception of image visual quality and aesthetic appeal.
Existing methods typically address these tasks independently due to distinct learning objectives.
We propose Unified vision-language pre-training of Quality and Aesthetics (UniQA) to learn general perceptions of two tasks, thereby benefiting them simultaneously.
arXiv Detail & Related papers (2024-06-03T07:40:10Z) - Adaptive Image Quality Assessment via Teaching Large Multimodal Model to Compare [99.57567498494448]
We introduce Compare2Score, an all-around LMM-based no-reference IQA model.
During training, we generate scaled-up comparative instructions by comparing images from the same IQA dataset.
Experiments on nine IQA datasets validate that the Compare2Score effectively bridges text-defined comparative levels during training.
arXiv Detail & Related papers (2024-05-29T17:26:09Z) - Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.
Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness.
Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings.
This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z) - Blind Image Quality Assessment via Vision-Language Correspondence: A
Multitask Learning Perspective [93.56647950778357]
Blind image quality assessment (BIQA) predicts the human perception of image quality without any reference information.
We develop a general and automated multitask learning scheme for BIQA to exploit auxiliary knowledge from other tasks.
arXiv Detail & Related papers (2023-03-27T07:58:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.