Limitations of Face Image Generation
- URL: http://arxiv.org/abs/2309.07277v2
- Date: Thu, 21 Dec 2023 15:26:22 GMT
- Title: Limitations of Face Image Generation
- Authors: Harrison Rosenberg, Shimaa Ahmed, Guruprasad V Ramesh, Ramya Korlakai
Vinayak, Kassem Fawaz
- Abstract summary: We study the efficacy and shortcomings of generative models in the context of face generation.
We identify several limitations of face image generation that include faithfulness to the text prompt, demographic disparities, and distributional shifts.
We present an analytical model that provides insights into how training data selection contributes to the performance of generative models.
- Score: 12.11955119100926
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text-to-image diffusion models have achieved widespread popularity due to
their unprecedented image generation capability. In particular, their ability
to synthesize and modify human faces has spurred research into using generated
face images in both training data augmentation and model performance
assessments. In this paper, we study the efficacy and shortcomings of
generative models in the context of face generation. Utilizing a combination of
qualitative and quantitative measures, including embedding-based metrics and
user studies, we present a framework to audit the characteristics of generated
faces conditioned on a set of social attributes. We applied our framework on
faces generated through state-of-the-art text-to-image diffusion models. We
identify several limitations of face image generation that include faithfulness
to the text prompt, demographic disparities, and distributional shifts.
Furthermore, we present an analytical model that provides insights into how
training data selection contributes to the performance of generative models.
Related papers
- CemiFace: Center-based Semi-hard Synthetic Face Generation for Face Recognition [33.17771044475894]
We show that face images with certain degree of similarities to their identity centers show great effectiveness in the performance of trained face recognition models.
Inspired by this, we propose a novel diffusion-based approach (namely Center-based Semi-hard Synthetic Face Generation) which produces facial samples with various levels of similarity to the subject center.
arXiv Detail & Related papers (2024-09-27T16:11:30Z) - Analyzing Quality, Bias, and Performance in Text-to-Image Generative Models [0.0]
Despite advances in generative models, most studies ignore the presence of bias.
In this paper, we examine several text-to-image models not only by qualitatively assessing their performance in generating accurate images of human faces, groups, and specified numbers of objects but also by presenting a social bias analysis.
As expected, models with larger capacity generate higher-quality images. However, we also document the inherent gender or social biases these models possess, offering a more complete understanding of their impact and limitations.
arXiv Detail & Related papers (2024-06-28T14:10:42Z) - Improving face generation quality and prompt following with synthetic captions [57.47448046728439]
We introduce a training-free pipeline designed to generate accurate appearance descriptions from images of people.
We then use these synthetic captions to fine-tune a text-to-image diffusion model.
Our results demonstrate that this approach significantly improves the model's ability to generate high-quality, realistic human faces.
arXiv Detail & Related papers (2024-05-17T15:50:53Z) - Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation [87.50120181861362]
VisionPrefer is a high-quality and fine-grained preference dataset that captures multiple preference aspects.
We train a reward model VP-Score over VisionPrefer to guide the training of text-to-image generative models and the preference prediction accuracy of VP-Score is comparable to human annotators.
arXiv Detail & Related papers (2024-04-23T14:53:15Z) - YaART: Yet Another ART Rendering Technology [119.09155882164573]
This study introduces YaART, a novel production-grade text-to-image cascaded diffusion model aligned to human preferences.
We analyze how these choices affect both the efficiency of the training process and the quality of the generated images.
We demonstrate that models trained on smaller datasets of higher-quality images can successfully compete with those trained on larger datasets.
arXiv Detail & Related papers (2024-04-08T16:51:19Z) - RenAIssance: A Survey into AI Text-to-Image Generation in the Era of
Large Model [93.8067369210696]
Text-to-image generation (TTI) refers to the usage of models that could process text input and generate high fidelity images based on text descriptions.
Diffusion models are one prominent type of generative model used for the generation of images through the systematic introduction of noises with repeating steps.
In the era of large models, scaling up model size and the integration with large language models have further improved the performance of TTI models.
arXiv Detail & Related papers (2023-09-02T03:27:20Z) - Identity-Preserving Aging of Face Images via Latent Diffusion Models [22.2699253042219]
We propose, train, and validate the use of latent text-to-image diffusion models for synthetically aging and de-aging face images.
Our models succeed with few-shot training, and have the added benefit of being controllable via intuitive textual prompting.
arXiv Detail & Related papers (2023-07-17T15:57:52Z) - Conditional Generation from Unconditional Diffusion Models using
Denoiser Representations [94.04631421741986]
We propose adapting pre-trained unconditional diffusion models to new conditions using the learned internal representations of the denoiser network.
We show that augmenting the Tiny ImageNet training set with synthetic images generated by our approach improves the classification accuracy of ResNet baselines by up to 8%.
arXiv Detail & Related papers (2023-06-02T20:09:57Z) - Membership Inference Attacks Against Text-to-image Generation Models [23.39695974954703]
This paper performs the first privacy analysis of text-to-image generation models through the lens of membership inference.
We propose three key intuitions about membership information and design four attack methodologies accordingly.
All of the proposed attacks can achieve significant performance, in some cases even close to an accuracy of 1, and thus the corresponding risk is much more severe than that shown by existing membership inference attacks.
arXiv Detail & Related papers (2022-10-03T14:31:39Z) - A comprehensive survey on semantic facial attribute editing using
generative adversarial networks [0.688204255655161]
A large number of face generation and manipulation models have been proposed.
Semantic facial attribute editing is the process of varying the values of one or more attributes of a face image.
Based on their architectures, the state-of-the-art models are categorized and studied as encoder-decoder, image-to-image, and photo-guided models.
arXiv Detail & Related papers (2022-05-21T13:09:38Z) - DALL-Eval: Probing the Reasoning Skills and Social Biases of
Text-to-Image Generation Models [73.12069620086311]
We investigate the visual reasoning capabilities and social biases of text-to-image models.
First, we measure three visual reasoning skills: object recognition, object counting, and spatial relation understanding.
Second, we assess the gender and skin tone biases by measuring the gender/skin tone distribution of generated images.
arXiv Detail & Related papers (2022-02-08T18:36:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.