TIAM -- A Metric for Evaluating Alignment in Text-to-Image Generation
- URL: http://arxiv.org/abs/2307.05134v2
- Date: Tue, 2 Jan 2024 21:18:48 GMT
- Title: TIAM -- A Metric for Evaluating Alignment in Text-to-Image Generation
- Authors: Paul Grimal, Herv\'e Le Borgne, Olivier Ferret, Julien Tourille
- Abstract summary: We propose a new metric based on prompt templates to study the alignment between the content specified in the prompt and the corresponding generated images.
An additional interesting result we obtained with our approach is that image quality can vary drastically depending on the noise used as a seed for the images.
- Score: 2.6890293832784566
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The progress in the generation of synthetic images has made it crucial to
assess their quality. While several metrics have been proposed to assess the
rendering of images, it is crucial for Text-to-Image (T2I) models, which
generate images based on a prompt, to consider additional aspects such as to
which extent the generated image matches the important content of the prompt.
Moreover, although the generated images usually result from a random starting
point, the influence of this one is generally not considered. In this article,
we propose a new metric based on prompt templates to study the alignment
between the content specified in the prompt and the corresponding generated
images. It allows us to better characterize the alignment in terms of the type
of the specified objects, their number, and their color. We conducted a study
on several recent T2I models about various aspects. An additional interesting
result we obtained with our approach is that image quality can vary drastically
depending on the noise used as a seed for the images. We also quantify the
influence of the number of concepts in the prompt, their order as well as their
(color) attributes. Finally, our method allows us to identify some seeds that
produce better images than others, opening novel directions of research on this
understudied topic.
Related papers
- A Survey on Quality Metrics for Text-to-Image Models [9.753473063305503]
We provide an overview of existing text-to-image quality metrics addressing their nuances and the need for alignment with human preferences.
We propose a new taxonomy for categorizing these metrics, which is grounded in the assumption that there are two main quality criteria, namely compositionality and generality.
We derive guidelines for practitioners conducting text-to-image evaluation, discuss open challenges of evaluation mechanisms, and surface limitations of current metrics.
arXiv Detail & Related papers (2024-03-18T14:24:20Z) - Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image
Personalization [56.12990759116612]
Pick-and-Draw is a training-free semantic guidance approach to boost identity consistency and generative diversity for personalization methods.
The proposed approach can be applied to any personalized diffusion models and requires as few as a single reference image.
arXiv Detail & Related papers (2024-01-30T05:56:12Z) - Stellar: Systematic Evaluation of Human-Centric Personalized
Text-to-Image Methods [52.806258774051216]
We focus on text-to-image systems that input a single image of an individual and ground the generation process along with text describing the desired visual context.
We introduce a standardized dataset (Stellar) that contains personalized prompts coupled with images of individuals that is an order of magnitude larger than existing relevant datasets and where rich semantic ground-truth annotations are readily available.
We derive a simple yet efficient, personalized text-to-image baseline that does not require test-time fine-tuning for each subject and which sets quantitatively and in human trials a new SoTA.
arXiv Detail & Related papers (2023-12-11T04:47:39Z) - Cross-Image Attention for Zero-Shot Appearance Transfer [68.43651329067393]
We introduce a cross-image attention mechanism that implicitly establishes semantic correspondences across images.
We harness three mechanisms that either manipulate the noisy latent codes or the model's internal representations throughout the denoising process.
Experiments show that our method is effective across a wide range of object categories and is robust to variations in shape, size, and viewpoint.
arXiv Detail & Related papers (2023-11-06T18:33:24Z) - ITI-GEN: Inclusive Text-to-Image Generation [56.72212367905351]
This study investigates inclusive text-to-image generative models that generate images based on human-written prompts.
We show that, for some attributes, images can represent concepts more expressively than text.
We propose a novel approach, ITI-GEN, that leverages readily available reference images for Inclusive Text-to-Image GENeration.
arXiv Detail & Related papers (2023-09-11T15:54:30Z) - Likelihood-Based Text-to-Image Evaluation with Patch-Level Perceptual
and Semantic Credit Assignment [48.835298314274254]
We propose to evaluate text-to-image generation performance by directly estimating the likelihood of the generated images.
A higher likelihood indicates better perceptual quality and better text-image alignment.
It can successfully assess the generation ability of these models with as few as a hundred samples.
arXiv Detail & Related papers (2023-08-16T17:26:47Z) - Test your samples jointly: Pseudo-reference for image quality evaluation [3.2634122554914]
We propose to jointly model different images depicting the same content to improve the precision of quality estimation.
Our experiments show that at test-time, our method successfully combines the features from multiple images depicting the same new content, improving estimation quality.
arXiv Detail & Related papers (2023-04-07T17:59:27Z) - Improving Generation and Evaluation of Visual Stories via Semantic
Consistency [72.00815192668193]
Given a series of natural language captions, an agent must generate a sequence of images that correspond to the captions.
Prior work has introduced recurrent generative models which outperform synthesis text-to-image models on this task.
We present a number of improvements to prior modeling approaches, including the addition of a dual learning framework.
arXiv Detail & Related papers (2021-05-20T20:42:42Z) - Words as Art Materials: Generating Paintings with Sequential GANs [8.249180979158815]
We investigate the generation of artistic images on a large variance dataset.
This dataset includes images with variations, for example, in shape, color, and content.
We propose a sequential Generative Adversarial Network model.
arXiv Detail & Related papers (2020-07-08T19:17:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.