PCQA: A Strong Baseline for AIGC Quality Assessment Based on Prompt Condition
- URL: http://arxiv.org/abs/2404.13299v1
- Date: Sat, 20 Apr 2024 07:05:45 GMT
- Title: PCQA: A Strong Baseline for AIGC Quality Assessment Based on Prompt Condition
- Authors: Xi Fang, Weigang Wang, Xiaoxin Lv, Jun Yan,
- Abstract summary: This study proposes an effective AIGC quality assessment (QA) framework.
First, we propose a hybrid prompt encoding method based on a dual-source CLIP (Contrastive Language-Image Pre-Training) text encoder.
Second, we propose an ensemble-based feature mixer module to effectively blend the adapted prompt and vision features.
- Score: 4.125007507808684
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The development of Large Language Models (LLM) and Diffusion Models brings the boom of Artificial Intelligence Generated Content (AIGC). It is essential to build an effective quality assessment framework to provide a quantifiable evaluation of different images or videos based on the AIGC technologies. The content generated by AIGC methods is driven by the crafted prompts. Therefore, it is intuitive that the prompts can also serve as the foundation of the AIGC quality assessment. This study proposes an effective AIGC quality assessment (QA) framework. First, we propose a hybrid prompt encoding method based on a dual-source CLIP (Contrastive Language-Image Pre-Training) text encoder to understand and respond to the prompt conditions. Second, we propose an ensemble-based feature mixer module to effectively blend the adapted prompt and vision features. The empirical study practices in two datasets: AIGIQA-20K (AI-Generated Image Quality Assessment database) and T2VQA-DB (Text-to-Video Quality Assessment DataBase), which validates the effectiveness of our proposed method: Prompt Condition Quality Assessment (PCQA). Our proposed simple and feasible framework may promote research development in the multimodal generation field.
Related papers
- Q-Ground: Image Quality Grounding with Large Multi-modality Models [61.72022069880346]
We introduce Q-Ground, the first framework aimed at tackling fine-scale visual quality grounding.
Q-Ground combines large multi-modality models with detailed visual quality analysis.
Central to our contribution is the introduction of the QGround-100K dataset.
arXiv Detail & Related papers (2024-07-24T06:42:46Z) - Vision-Language Consistency Guided Multi-modal Prompt Learning for Blind AI Generated Image Quality Assessment [57.07360640784803]
We propose vision-language consistency guided multi-modal prompt learning for blind image quality assessment (AGIQA)
Specifically, we introduce learnable textual and visual prompts in language and vision branches of Contrastive Language-Image Pre-training (CLIP) models.
We design a text-to-image alignment quality prediction task, whose learned vision-language consistency knowledge is used to guide the optimization of the above multi-modal prompts.
arXiv Detail & Related papers (2024-06-24T13:45:31Z) - UniQA: Unified Vision-Language Pre-training for Image Quality and Aesthetic Assessment [23.48816491333345]
Image Quality Assessment (IQA) and Image Aesthetic Assessment (IAA) aim to simulate human subjective perception of image visual quality and aesthetic appeal.
Existing methods typically address these tasks independently due to distinct learning objectives.
We propose Unified vision-language pre-training of Quality and Aesthetics (UniQA) to learn general perceptions of two tasks, thereby benefiting them simultaneously.
arXiv Detail & Related papers (2024-06-03T07:40:10Z) - PKU-AIGIQA-4K: A Perceptual Quality Assessment Database for Both Text-to-Image and Image-to-Image AI-Generated Images [1.5265677582796984]
We establish a large scale perceptual quality assessment database for both text-to-image and image-to-image AIGIs, named PKU-AIGIQA-4K.
We propose three image quality assessment (IQA) methods based on pre-trained models that include a no-reference method NR-AIGCIQA, a full-reference method FR-AIGCIQA, and a partial-reference method PR-AIGCIQA.
arXiv Detail & Related papers (2024-04-29T03:57:43Z) - Large Multi-modality Model Assisted AI-Generated Image Quality Assessment [53.182136445844904]
We introduce a large Multi-modality model Assisted AI-Generated Image Quality Assessment (MA-AGIQA) model.
It uses semantically informed guidance to sense semantic information and extract semantic vectors through carefully designed text prompts.
It achieves state-of-the-art performance, and demonstrates its superior generalization capabilities on assessing the quality of AI-generated images.
arXiv Detail & Related papers (2024-04-27T02:40:36Z) - Multi-Modal Prompt Learning on Blind Image Quality Assessment [65.0676908930946]
Image Quality Assessment (IQA) models benefit significantly from semantic information, which allows them to treat different types of objects distinctly.
Traditional methods, hindered by a lack of sufficiently annotated data, have employed the CLIP image-text pretraining model as their backbone to gain semantic awareness.
Recent approaches have attempted to address this mismatch using prompt technology, but these solutions have shortcomings.
This paper introduces an innovative multi-modal prompt-based methodology for IQA.
arXiv Detail & Related papers (2024-04-23T11:45:32Z) - PKU-I2IQA: An Image-to-Image Quality Assessment Database for AI
Generated Images [1.6031185986328562]
We establish a human perception-based image-to-image AIGCIQA database, named PKU-I2IQA.
We propose two benchmark models: NR-AIGCIQA based on the no-reference image quality assessment method and FR-AIGCIQA based on the full-reference image quality assessment method.
arXiv Detail & Related papers (2023-11-27T05:53:03Z) - Semantic Communications for Artificial Intelligence Generated Content
(AIGC) Toward Effective Content Creation [75.73229320559996]
This paper develops a conceptual model for the integration of AIGC and SemCom.
A novel framework that employs AIGC technology is proposed as an encoder and decoder for semantic information.
The framework can adapt to different types of content generated, the required quality, and the semantic information utilized.
arXiv Detail & Related papers (2023-08-09T13:17:21Z) - A Perceptual Quality Assessment Exploration for AIGC Images [39.72512063793346]
In this paper, we discuss the major evaluation aspects such as technical issues, AI artifacts, unnaturalness, discrepancy, and aesthetics for AGI quality assessment.
We present the first perceptual AGI quality assessment database, AGIQA-1K, which consists of 1,080 AGIs generated from diffusion models.
arXiv Detail & Related papers (2023-03-22T14:59:49Z) - Generating Diverse and Consistent QA pairs from Contexts with
Information-Maximizing Hierarchical Conditional VAEs [62.71505254770827]
We propose a conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts.
Our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.
arXiv Detail & Related papers (2020-05-28T08:26:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.