Dog-IQA: Standard-guided Zero-shot MLLM for Mix-grained Image Quality Assessment
- URL: http://arxiv.org/abs/2410.02505v2
- Date: Thu, 10 Oct 2024 05:36:30 GMT
- Title: Dog-IQA: Standard-guided Zero-shot MLLM for Mix-grained Image Quality Assessment
- Authors: Kai Liu, Ziqing Zhang, Wenbo Li, Renjing Pei, Fenglong Song, Xiaohong Liu, Linghe Kong, Yulun Zhang,
- Abstract summary: We propose Dog-IQA, a standard-guided zero-shot mix-grained IQA method, which is training-free and utilizes the exceptional prior knowledge of multimodal large language models (MLLMs)
Dog-IQA objectively scores with specific standards that utilize MLLM's behavior pattern and minimize the influence of subjective factors.
Our proposed Dog-IQA achieves state-of-the-art (SOTA) performance compared with training-free methods, and competitive performance compared with training-based methods in cross-dataset scenarios.
- Score: 57.10083003305353
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image quality assessment (IQA) serves as the golden standard for all models' performance in nearly all computer vision fields. However, it still suffers from poor out-of-distribution generalization ability and expensive training costs. To address these problems, we propose Dog-IQA, a standard-guided zero-shot mix-grained IQA method, which is training-free and utilizes the exceptional prior knowledge of multimodal large language models (MLLMs). To obtain accurate IQA scores, namely scores consistent with humans, we design an MLLM-based inference pipeline that imitates human experts. In detail, Dog-IQA applies two techniques. First, Dog-IQA objectively scores with specific standards that utilize MLLM's behavior pattern and minimize the influence of subjective factors. Second, Dog-IQA comprehensively takes local semantic objects and the whole image as input and aggregates their scores, leveraging local and global information. Our proposed Dog-IQA achieves state-of-the-art (SOTA) performance compared with training-free methods, and competitive performance compared with training-based methods in cross-dataset scenarios. Our code will be available at https://github.com/Kai-Liu001/Dog-IQA.
Related papers
- Grounding-IQA: Multimodal Language Grounding Model for Image Quality Assessment [69.07445098168344]
We introduce a new image quality assessment (IQA) task paradigm, grounding-IQA.
Grounding-IQA comprises two subtasks: grounding-IQA-description (GIQA-DES) and visual question answering (GIQA-VQA)
To realize grounding-IQA, we construct a corresponding dataset, GIQA-160K, through our proposed automated annotation pipeline.
Experiments demonstrate that our proposed task paradigm, dataset, and benchmark facilitate the more fine-grained IQA application.
arXiv Detail & Related papers (2024-11-26T09:03:16Z) - Boosting CLIP Adaptation for Image Quality Assessment via Meta-Prompt Learning and Gradient Regularization [55.09893295671917]
This paper introduces a novel Gradient-Regulated Meta-Prompt IQA Framework (GRMP-IQA)
The GRMP-IQA comprises two key modules: Meta-Prompt Pre-training Module and Quality-Aware Gradient Regularization.
Experiments on five standard BIQA datasets demonstrate the superior performance to the state-of-the-art BIQA methods under limited data setting.
arXiv Detail & Related papers (2024-09-09T07:26:21Z) - DP-IQA: Utilizing Diffusion Prior for Blind Image Quality Assessment in the Wild [54.139923409101044]
Blind image quality assessment (IQA) in the wild presents significant challenges.
Given the difficulty in collecting large-scale training data, leveraging limited data to develop a model with strong generalization remains an open problem.
Motivated by the robust image perception capabilities of pre-trained text-to-image (T2I) diffusion models, we propose a novel IQA method, diffusion priors-based IQA.
arXiv Detail & Related papers (2024-05-30T12:32:35Z) - Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models [28.194638379354252]
We introduce a Depicted image Quality Assessment method (DepictQA), overcoming the constraints of traditional score-based methods.
DepictQA allows for detailed, language-based, human-like evaluation of image quality by leveraging Multi-modal Large Language Models.
These results showcase the research potential of multi-modal IQA methods.
arXiv Detail & Related papers (2023-12-14T14:10:02Z) - Task-Specific Normalization for Continual Learning of Blind Image
Quality Models [105.03239956378465]
We present a simple yet effective continual learning method for blind image quality assessment (BIQA)
The key step in our approach is to freeze all convolution filters of a pre-trained deep neural network (DNN) for an explicit promise of stability.
We assign each new IQA dataset (i.e., task) a prediction head, and load the corresponding normalization parameters to produce a quality score.
The final quality estimate is computed by black a weighted summation of predictions from all heads with a lightweight $K$-means gating mechanism.
arXiv Detail & Related papers (2021-07-28T15:21:01Z) - MetaIQA: Deep Meta-learning for No-Reference Image Quality Assessment [73.55944459902041]
This paper presents a no-reference IQA metric based on deep meta-learning.
We first collect a number of NR-IQA tasks for different distortions.
Then meta-learning is adopted to learn the prior knowledge shared by diversified distortions.
Extensive experiments demonstrate that the proposed metric outperforms the state-of-the-arts by a large margin.
arXiv Detail & Related papers (2020-04-11T23:36:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.