A Survey on Quality Metrics for Text-to-Image Models
- URL: http://arxiv.org/abs/2403.11821v4
- Date: Tue, 23 Jul 2024 07:04:09 GMT
- Title: A Survey on Quality Metrics for Text-to-Image Models
- Authors: Sebastian Hartwig, Dominik Engel, Leon Sick, Hannah Kniesel, Tristan Payer, Poonam Poonam, Michael Glöckler, Alex Bäuerle, Timo Ropinski,
- Abstract summary: We provide an overview of existing text-to-image quality metrics addressing their nuances and the need for alignment with human preferences.
We propose a new taxonomy for categorizing these metrics, which is grounded in the assumption that there are two main quality criteria, namely compositionality and generality.
We derive guidelines for practitioners conducting text-to-image evaluation, discuss open challenges of evaluation mechanisms, and surface limitations of current metrics.
- Score: 9.753473063305503
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent AI-based text-to-image models not only excel at generating realistic images, they also give designers more and more fine-grained control over the image content. Consequently, these approaches have gathered increased attention within the computer graphics research community, which has been historically devoted towards traditional rendering techniques that offer precise control over scene parameters such as objects, materials, and lighting, when generating realistic images. While the quality of rendered images is traditionally assessed through well-established image quality metrics, such as SSIM or PSNR, the unique challenges presented by text-to-image models, which in contrast to rendering interweave the control of scene and rendering parameters, necessitate the development of novel image quality metrics. Therefore, within this survey, we provide a comprehensive overview of existing text-to-image quality metrics addressing their nuances and the need for alignment with human preferences. Based on our findings, we propose a new taxonomy for categorizing these metrics, which is grounded in the assumption that there are two main quality criteria, namely compositionality and generality, which ideally map to human preferences. Ultimately, we derive guidelines for practitioners conducting text-to-image evaluation, discuss open challenges of evaluation mechanisms, and surface limitations of current metrics.
Related papers
- KITTEN: A Knowledge-Intensive Evaluation of Image Generation on Visual Entities [93.74881034001312]
We conduct a systematic study on the fidelity of entities in text-to-image generation models.
We focus on their ability to generate a wide range of real-world visual entities, such as landmark buildings, aircraft, plants, and animals.
Our findings reveal that even the most advanced text-to-image models often fail to generate entities with accurate visual details.
arXiv Detail & Related papers (2024-10-15T17:50:37Z) - Rank-based No-reference Quality Assessment for Face Swapping [88.53827937914038]
The metric of measuring the quality in most face swapping methods relies on several distances between the manipulated images and the source image.
We present a novel no-reference image quality assessment (NR-IQA) method specifically designed for face swapping.
arXiv Detail & Related papers (2024-06-04T01:36:29Z) - Information Theoretic Text-to-Image Alignment [49.396917351264655]
We present a novel method that relies on an information-theoretic alignment measure to steer image generation.
Our method is on-par or superior to the state-of-the-art, yet requires nothing but a pre-trained denoising network to estimate MI.
arXiv Detail & Related papers (2024-05-31T12:20:02Z) - QUASAR: QUality and Aesthetics Scoring with Advanced Representations [20.194917729936357]
This paper introduces a new data-driven, non-parametric method for image quality and aesthetics assessment.
We eliminate the need for expressive textual embeddings by proposing efficient image anchors in the data.
arXiv Detail & Related papers (2024-03-11T16:21:50Z) - Advancing Generative Model Evaluation: A Novel Algorithm for Realistic
Image Synthesis and Comparison in OCR System [1.2289361708127877]
This research addresses a critical challenge in the field of generative models, particularly in the generation and evaluation of synthetic images.
We introduce a pioneering algorithm to objectively assess the realism of synthetic images.
Our algorithm is particularly tailored to address the challenges in generating and evaluating realistic images of Arabic handwritten digits.
arXiv Detail & Related papers (2024-02-27T04:53:53Z) - Stellar: Systematic Evaluation of Human-Centric Personalized
Text-to-Image Methods [52.806258774051216]
We focus on text-to-image systems that input a single image of an individual and ground the generation process along with text describing the desired visual context.
We introduce a standardized dataset (Stellar) that contains personalized prompts coupled with images of individuals that is an order of magnitude larger than existing relevant datasets and where rich semantic ground-truth annotations are readily available.
We derive a simple yet efficient, personalized text-to-image baseline that does not require test-time fine-tuning for each subject and which sets quantitatively and in human trials a new SoTA.
arXiv Detail & Related papers (2023-12-11T04:47:39Z) - Image Quality Assessment in the Modern Age [53.19271326110551]
This tutorial provides the audience with the basic theories, methodologies, and current progresses of image quality assessment (IQA)
We will first revisit several subjective quality assessment methodologies, with emphasis on how to properly select visual stimuli.
Both hand-engineered and (deep) learning-based methods will be covered.
arXiv Detail & Related papers (2021-10-19T02:38:46Z) - Cross-Quality LFW: A Database for Analyzing Cross-Resolution Image Face
Recognition in Unconstrained Environments [8.368543987898732]
Real-world face recognition applications often deal with suboptimal image quality or resolution due to different capturing conditions.
Recent cross-resolution face recognition approaches used simple, arbitrary, and unrealistic down- and up-scaling techniques to measure distances against real-world edge-cases in image quality.
We propose a new standardized benchmark dataset and evaluation protocol derived from the famous Labeled Faces in the Wild.
arXiv Detail & Related papers (2021-08-23T17:04:32Z) - Improving Generation and Evaluation of Visual Stories via Semantic
Consistency [72.00815192668193]
Given a series of natural language captions, an agent must generate a sequence of images that correspond to the captions.
Prior work has introduced recurrent generative models which outperform synthesis text-to-image models on this task.
We present a number of improvements to prior modeling approaches, including the addition of a dual learning framework.
arXiv Detail & Related papers (2021-05-20T20:42:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.