FERGI: Automatic Annotation of User Preferences for Text-to-Image Generation from Spontaneous Facial Expression Reaction
- URL: http://arxiv.org/abs/2312.03187v3
- Date: Wed, 28 Aug 2024 10:00:01 GMT
- Title: FERGI: Automatic Annotation of User Preferences for Text-to-Image Generation from Spontaneous Facial Expression Reaction
- Authors: Shuangquan Feng, Junhua Ma, Virginia R. de Sa,
- Abstract summary: We develop and test a method to automatically score user preferences from their spontaneous facial expression reaction to the generated images.
We collect a dataset of Facial Expression Reaction to Generated Images (FERGI) and show that the activations of multiple facial action units (AUs) are highly correlated with user evaluations of the generated images.
We develop an FAU-Net (Facial Action Units Neural Network), which receives inputs from an AU estimation model, to automatically score user preferences for text-to-image generation.
- Score: 2.3691158404002066
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Researchers have proposed to use data of human preference feedback to fine-tune text-to-image generative models. However, the scalability of human feedback collection has been limited by its reliance on manual annotation. Therefore, we develop and test a method to automatically score user preferences from their spontaneous facial expression reaction to the generated images. We collect a dataset of Facial Expression Reaction to Generated Images (FERGI) and show that the activations of multiple facial action units (AUs) are highly correlated with user evaluations of the generated images. We develop an FAU-Net (Facial Action Units Neural Network), which receives inputs from an AU estimation model, to automatically score user preferences for text-to-image generation based on their facial expression reactions, which is complementary to the pre-trained scoring models based on the input text prompts and generated images. Integrating our FAU-Net valence score with the pre-trained scoring models improves their consistency with human preferences. This method of automatic annotation with facial expression analysis can be potentially generalized to other generation tasks. The code is available at https://github.com/ShuangquanFeng/FERGI, and the dataset is also available at the same link for research purposes.
Related papers
- Human Learning by Model Feedback: The Dynamics of Iterative Prompting
with Midjourney [28.39697076030535]
This paper analyzes the dynamics of the user prompts along such iterations.
We show that prompts predictably converge toward specific traits along these iterations.
The possibility that users adapt to the model's preference raises concerns about reusing user data for further training.
arXiv Detail & Related papers (2023-11-20T19:28:52Z) - SelfEval: Leveraging the discriminative nature of generative models for
evaluation [35.7242199928684]
We show that text-to-image generative models can be 'inverted' to assess their own text-image understanding capabilities.
Our method, called SelfEval, uses the generative model to compute the likelihood of real images given text prompts.
arXiv Detail & Related papers (2023-11-17T18:58:16Z) - ITI-GEN: Inclusive Text-to-Image Generation [56.72212367905351]
This study investigates inclusive text-to-image generative models that generate images based on human-written prompts.
We show that, for some attributes, images can represent concepts more expressively than text.
We propose a novel approach, ITI-GEN, that leverages readily available reference images for Inclusive Text-to-Image GENeration.
arXiv Detail & Related papers (2023-09-11T15:54:30Z) - Face0: Instantaneously Conditioning a Text-to-Image Model on a Face [3.5150821092068383]
We present Face0, a novel way to instantaneously condition a text-to-image generation model on a face.
We augment a dataset of annotated images with embeddings of the included faces and train an image generation model, on the augmented dataset.
Our method achieves pleasing results, is remarkably simple, extremely fast, and equips the underlying model with new capabilities.
arXiv Detail & Related papers (2023-06-11T09:52:03Z) - Taming Encoder for Zero Fine-tuning Image Customization with
Text-to-Image Diffusion Models [55.04969603431266]
This paper proposes a method for generating images of customized objects specified by users.
The method is based on a general framework that bypasses the lengthy optimization required by previous approaches.
We demonstrate through experiments that our proposed method is able to synthesize images with compelling output quality, appearance diversity, and object fidelity.
arXiv Detail & Related papers (2023-04-05T17:59:32Z) - Re-Imagen: Retrieval-Augmented Text-to-Image Generator [58.60472701831404]
Retrieval-Augmented Text-to-Image Generator (Re-Imagen)
Retrieval-Augmented Text-to-Image Generator (Re-Imagen)
arXiv Detail & Related papers (2022-09-29T00:57:28Z) - Semantic-Aware Generation for Self-Supervised Visual Representation
Learning [116.5814634936371]
We advocate for Semantic-aware Generation (SaGe) to facilitate richer semantics rather than details to be preserved in the generated image.
SaGe complements the target network with view-specific features and thus alleviates the semantic degradation brought by intensive data augmentations.
We execute SaGe on ImageNet-1K and evaluate the pre-trained models on five downstream tasks including nearest neighbor test, linear classification, and fine-scaled image recognition.
arXiv Detail & Related papers (2021-11-25T16:46:13Z) - Scene Graph Generation for Better Image Captioning? [48.411957217304]
We propose a model that leverages detected objects and auto-generated visual relationships to describe images in natural language.
We generate a scene graph from raw image pixels by identifying individual objects and visual relationships between them.
This scene graph then serves as input to our graph-to-text model, which generates the final caption.
arXiv Detail & Related papers (2021-09-23T14:35:11Z) - Deep Image Synthesis from Intuitive User Input: A Review and
Perspectives [23.01321275304037]
It is desirable for a user to provide intuitive non-image input, such as text, sketch, stroke, graph or layout, and have a computer system automatically generate photo-realistic images that adhere to the input content.
Recent advances in deep generative models such as generative adversarial networks (GANs), variational autoencoders (VAEs), and flow-based methods have enabled more powerful and versatile image generation tasks.
This paper reviews recent works for image synthesis given intuitive user input, covering advances in input versatility, image generation methodology, benchmark datasets, and evaluation metrics.
arXiv Detail & Related papers (2021-07-09T06:31:47Z) - Ensembling with Deep Generative Views [72.70801582346344]
generative models can synthesize "views" of artificial images that mimic real-world variations, such as changes in color or pose.
Here, we investigate whether such views can be applied to real images to benefit downstream analysis tasks such as image classification.
We use StyleGAN2 as the source of generative augmentations and investigate this setup on classification tasks involving facial attributes, cat faces, and cars.
arXiv Detail & Related papers (2021-04-29T17:58:35Z) - MorphGAN: One-Shot Face Synthesis GAN for Detecting Recognition Bias [13.162012586770576]
We describe a simulator that applies specific head pose and facial expression adjustments to images of previously unseen people.
We show that by augmenting small datasets of faces with new poses and expressions improves the recognition performance by up to 9% depending on the augmentation and data scarcity.
arXiv Detail & Related papers (2020-12-09T18:43:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.