Bringing Textual Prompt to AI-Generated Image Quality Assessment
- URL: http://arxiv.org/abs/2403.18714v2
- Date: Tue, 21 May 2024 06:57:24 GMT
- Title: Bringing Textual Prompt to AI-Generated Image Quality Assessment
- Authors: Bowen Qu, Haohui Li, Wei Gao,
- Abstract summary: IP-IQA (AGIs Quality Assessment via Image and Prompt) is a multimodal framework for AGIQA via corresponding image and prompt incorporation.
An effective and efficient image-prompt fusion module, along with a novel special [QA] token, are also applied.
Experiments demonstrate that our IP-IQA achieves the state-of-the-art on AGIQA-1k and AGIQA-3k datasets.
- Score: 4.230780744307392
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: AI-Generated Images (AGIs) have inherent multimodal nature. Unlike traditional image quality assessment (IQA) on natural scenarios, AGIs quality assessment (AGIQA) takes the correspondence of image and its textual prompt into consideration. This is coupled in the ground truth score, which confuses the unimodal IQA methods. To solve this problem, we introduce IP-IQA (AGIs Quality Assessment via Image and Prompt), a multimodal framework for AGIQA via corresponding image and prompt incorporation. Specifically, we propose a novel incremental pretraining task named Image2Prompt for better understanding of AGIs and their corresponding textual prompts. An effective and efficient image-prompt fusion module, along with a novel special [QA] token, are also applied. Both are plug-and-play and beneficial for the cooperation of image and its corresponding prompt. Experiments demonstrate that our IP-IQA achieves the state-of-the-art on AGIQA-1k and AGIQA-3k datasets. Code will be available at https://github.com/Coobiw/IP-IQA.
Related papers
- Grounding-IQA: Multimodal Language Grounding Model for Image Quality Assessment [69.07445098168344]
We introduce a new image quality assessment (IQA) task paradigm, grounding-IQA.
Grounding-IQA comprises two subtasks: grounding-IQA-description (GIQA-DES) and visual question answering (GIQA-VQA)
To realize grounding-IQA, we construct a corresponding dataset, GIQA-160K, through our proposed automated annotation pipeline.
Experiments demonstrate that our proposed task paradigm, dataset, and benchmark facilitate the more fine-grained IQA application.
arXiv Detail & Related papers (2024-11-26T09:03:16Z) - Vision-Language Consistency Guided Multi-modal Prompt Learning for Blind AI Generated Image Quality Assessment [57.07360640784803]
We propose vision-language consistency guided multi-modal prompt learning for blind image quality assessment (AGIQA)
Specifically, we introduce learnable textual and visual prompts in language and vision branches of Contrastive Language-Image Pre-training (CLIP) models.
We design a text-to-image alignment quality prediction task, whose learned vision-language consistency knowledge is used to guide the optimization of the above multi-modal prompts.
arXiv Detail & Related papers (2024-06-24T13:45:31Z) - UniQA: Unified Vision-Language Pre-training for Image Quality and Aesthetic Assessment [23.48816491333345]
Image Quality Assessment (IQA) and Image Aesthetic Assessment (IAA) aim to simulate human subjective perception of image visual quality and aesthetic appeal.
Existing methods typically address these tasks independently due to distinct learning objectives.
We propose Unified vision-language pre-training of Quality and Aesthetics (UniQA) to learn general perceptions of two tasks, thereby benefiting them simultaneously.
arXiv Detail & Related papers (2024-06-03T07:40:10Z) - Understanding and Evaluating Human Preferences for AI Generated Images with Instruction Tuning [58.41087653543607]
We first establish a novel Image Quality Assessment (IQA) database for AIGIs, termed AIGCIQA2023+.
This paper presents a MINT-IQA model to evaluate and explain human preferences for AIGIs from Multi-perspectives with INstruction Tuning.
arXiv Detail & Related papers (2024-05-12T17:45:11Z) - Cross-IQA: Unsupervised Learning for Image Quality Assessment [3.2287957986061038]
We propose a no-reference image quality assessment (NR-IQA) method termed Cross-IQA based on vision transformer(ViT) model.
The proposed Cross-IQA method can learn image quality features from unlabeled image data.
Experimental results show that Cross-IQA can achieve state-of-the-art performance in assessing the low-frequency degradation information.
arXiv Detail & Related papers (2024-05-07T13:35:51Z) - Large Multi-modality Model Assisted AI-Generated Image Quality Assessment [53.182136445844904]
We introduce a large Multi-modality model Assisted AI-Generated Image Quality Assessment (MA-AGIQA) model.
It uses semantically informed guidance to sense semantic information and extract semantic vectors through carefully designed text prompts.
It achieves state-of-the-art performance, and demonstrates its superior generalization capabilities on assessing the quality of AI-generated images.
arXiv Detail & Related papers (2024-04-27T02:40:36Z) - TIER: Text-Image Encoder-based Regression for AIGC Image Quality
Assessment [2.59079758388817]
In AIGCIQA tasks, images are typically generated by generative models using text prompts.
Most existing AIGCIQA methods regress predicted scores directly from individual generated images.
We propose a text-image encoder-based regression (TIER) framework to address this issue.
arXiv Detail & Related papers (2024-01-08T12:35:15Z) - Blind Image Quality Assessment via Vision-Language Correspondence: A
Multitask Learning Perspective [93.56647950778357]
Blind image quality assessment (BIQA) predicts the human perception of image quality without any reference information.
We develop a general and automated multitask learning scheme for BIQA to exploit auxiliary knowledge from other tasks.
arXiv Detail & Related papers (2023-03-27T07:58:09Z) - TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation [55.83319599681002]
Text-VQA aims at answering questions that require understanding the textual cues in an image.
We develop a new method to generate high-quality and diverse QA pairs by explicitly utilizing the existing rich text available in the scene context of each image.
arXiv Detail & Related papers (2022-08-03T02:18:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.