A Survey on Quality Metrics for Text-to-Image Generation
- URL: http://arxiv.org/abs/2403.11821v5
- Date: Wed, 29 Jan 2025 08:48:10 GMT
- Title: A Survey on Quality Metrics for Text-to-Image Generation
- Authors: Sebastian Hartwig, Dominik Engel, Leon Sick, Hannah Kniesel, Tristan Payer, Poonam Poonam, Michael Glöckler, Alex Bäuerle, Timo Ropinski,
- Abstract summary: AI-based text-to-image models do not only excel at generating realistic images, they also give designers more and more fine-grained control over the image content.
These approaches have gathered increased attention within the computer graphics research community.
We provide a comprehensive overview of such text-to-image quality metrics, and propose a taxonomy to categorize these metrics.
- Score: 9.753473063305503
- License:
- Abstract: AI-based text-to-image models do not only excel at generating realistic images, they also give designers more and more fine-grained control over the image content. Consequently, these approaches have gathered increased attention within the computer graphics research community, which has been historically devoted towards traditional rendering techniques, that offer precise control over scene parameters (e.g., objects, materials, and lighting). While the quality of conventionally rendered images is assessed through well established image quality metrics, such as SSIM or PSNR, the unique challenges of text-to-image generation require other, dedicated quality metrics. These metrics must be able to not only measure overall image quality, but also how well images reflect given text prompts, whereby the control of scene and rendering parameters is interweaved. Within this survey, we provide a comprehensive overview of such text-to-image quality metrics, and propose a taxonomy to categorize these metrics. Our taxonomy is grounded in the assumption, that there are two main quality criteria, namely compositional quality and general quality, that contribute to the overall image quality. Besides the metrics, this survey covers dedicated text-to-image benchmark datasets, over which the metrics are frequently computed. Finally, we identify limitations and open challenges in the field of text-to-image generation, and derive guidelines for practitioners conducting text-to-image evaluation.
Related papers
- Visual question answering based evaluation metrics for text-to-image generation [7.105786967332924]
This paper proposes new evaluation metrics that assess the alignment between input text and generated images for every individual object.
Experimental results show that our proposed evaluation approach is the superior metric that can simultaneously assess finer text-image alignment and image quality.
arXiv Detail & Related papers (2024-11-15T13:32:23Z) - KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities [93.74881034001312]
We conduct a systematic study on the fidelity of entities in text-to-image generation models.
We focus on their ability to generate a wide range of real-world visual entities, such as landmark buildings, aircraft, plants, and animals.
Our findings reveal that even the most advanced text-to-image models often fail to generate entities with accurate visual details.
arXiv Detail & Related papers (2024-10-15T17:50:37Z) - TIER: Text-Image Encoder-based Regression for AIGC Image Quality
Assessment [2.59079758388817]
In AIGCIQA tasks, images are typically generated by generative models using text prompts.
Most existing AIGCIQA methods regress predicted scores directly from individual generated images.
We propose a text-image encoder-based regression (TIER) framework to address this issue.
arXiv Detail & Related papers (2024-01-08T12:35:15Z) - Stellar: Systematic Evaluation of Human-Centric Personalized
Text-to-Image Methods [52.806258774051216]
We focus on text-to-image systems that input a single image of an individual and ground the generation process along with text describing the desired visual context.
We introduce a standardized dataset (Stellar) that contains personalized prompts coupled with images of individuals that is an order of magnitude larger than existing relevant datasets and where rich semantic ground-truth annotations are readily available.
We derive a simple yet efficient, personalized text-to-image baseline that does not require test-time fine-tuning for each subject and which sets quantitatively and in human trials a new SoTA.
arXiv Detail & Related papers (2023-12-11T04:47:39Z) - Likelihood-Based Text-to-Image Evaluation with Patch-Level Perceptual
and Semantic Credit Assignment [48.835298314274254]
We propose to evaluate text-to-image generation performance by directly estimating the likelihood of the generated images.
A higher likelihood indicates better perceptual quality and better text-image alignment.
It can successfully assess the generation ability of these models with as few as a hundred samples.
arXiv Detail & Related papers (2023-08-16T17:26:47Z) - TIAM -- A Metric for Evaluating Alignment in Text-to-Image Generation [2.6890293832784566]
We propose a new metric based on prompt templates to study the alignment between the content specified in the prompt and the corresponding generated images.
An additional interesting result we obtained with our approach is that image quality can vary drastically depending on the noise used as a seed for the images.
arXiv Detail & Related papers (2023-07-11T09:23:05Z) - Transparent Human Evaluation for Image Captioning [70.03979566548823]
We develop a rubric-based human evaluation protocol for image captioning models.
We show that human-generated captions show substantially higher quality than machine-generated ones.
We hope that this work will promote a more transparent evaluation protocol for image captioning.
arXiv Detail & Related papers (2021-11-17T07:09:59Z) - Improving Generation and Evaluation of Visual Stories via Semantic
Consistency [72.00815192668193]
Given a series of natural language captions, an agent must generate a sequence of images that correspond to the captions.
Prior work has introduced recurrent generative models which outperform synthesis text-to-image models on this task.
We present a number of improvements to prior modeling approaches, including the addition of a dual learning framework.
arXiv Detail & Related papers (2021-05-20T20:42:42Z) - Intrinsic Image Captioning Evaluation [53.51379676690971]
We propose a learning based metrics for image captioning, which we call Intrinsic Image Captioning Evaluation(I2CE)
Experiment results show that our proposed method can keep robust performance and give more flexible scores to candidate captions when encountered with semantic similar expression or less aligned semantics.
arXiv Detail & Related papers (2020-12-14T08:36:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.