ChildDiffusion: Unlocking the Potential of Generative AI and Controllable Augmentations for Child Facial Data using Stable Diffusion and Large Language Models
- URL: http://arxiv.org/abs/2406.11592v1
- Date: Mon, 17 Jun 2024 14:37:14 GMT
- Title: ChildDiffusion: Unlocking the Potential of Generative AI and Controllable Augmentations for Child Facial Data using Stable Diffusion and Large Language Models
- Authors: Muhammad Ali Farooq, Wang Yao, Peter Corcoran,
- Abstract summary: The framework is validated by rendering high-quality child faces representing ethnicity data, micro expressions, face pose variations, eye blinking effects, different hair colours and styles, aging, multiple and different child gender subjects in a single frame.
The proposed method circumvents common issues encountered in generative AI tools, such as temporal inconsistency and limited control over the rendered outputs.
- Score: 1.1470070927586018
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this research work we have proposed high-level ChildDiffusion framework capable of generating photorealistic child facial samples and further embedding several intelligent augmentations on child facial data using short text prompts, detailed textual guidance from LLMs, and further image to image transformation using text guidance control conditioning thus providing an opportunity to curate fully synthetic large scale child datasets. The framework is validated by rendering high-quality child faces representing ethnicity data, micro expressions, face pose variations, eye blinking effects, facial accessories, different hair colours and styles, aging, multiple and different child gender subjects in a single frame. Addressing privacy concerns regarding child data acquisition requires a comprehensive approach that involves legal, ethical, and technological considerations. Keeping this in view this framework can be adapted to synthesise child facial data which can be effectively used for numerous downstream machine learning tasks. The proposed method circumvents common issues encountered in generative AI tools, such as temporal inconsistency and limited control over the rendered outputs. As an exemplary use case we have open-sourced child ethnicity data consisting of 2.5k child facial samples of five different classes which includes African, Asian, White, South Asian/ Indian, and Hispanic races by deploying the model in production inference phase. The rendered data undergoes rigorous qualitative as well as quantitative tests to cross validate its efficacy and further fine-tuning Yolo architecture for detecting and classifying child ethnicity as an exemplary downstream machine learning task.
Related papers
- Self-Explainable Affordance Learning with Embodied Caption [63.88435741872204]
We introduce Self-Explainable Affordance learning (SEA) with embodied caption.
SEA enables robots to articulate their intentions and bridge the gap between explainable vision-language caption and visual affordance learning.
We propose a novel model to effectively combine affordance grounding with self-explanation in a simple but efficient manner.
arXiv Detail & Related papers (2024-04-08T15:22:38Z) - Synthetic Speaking Children -- Why We Need Them and How to Make Them [3.1367597377725502]
We show how StyleGAN2 can be finetuned to create a gender balanced dataset of children's faces.
By combining generative text to speech models for child voice synthesis and a 3D landmark based talking heads pipeline, we can generate highly realistic, entirely synthetic, talking child video clips.
arXiv Detail & Related papers (2023-11-08T22:58:22Z) - Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images
with Free Attention Masks [64.67735676127208]
Text-to-image diffusion models have shown great potential for benefiting image recognition.
Although promising, there has been inadequate exploration dedicated to unsupervised learning on diffusion-generated images.
We introduce customized solutions by fully exploiting the aforementioned free attention masks.
arXiv Detail & Related papers (2023-08-13T10:07:46Z) - A Comparative Study of Image-to-Image Translation Using GANs for
Synthetic Child Race Data [1.6536018920603175]
This work proposes the utilization of image-to-image transformation to synthesize data of different races and adjust the ethnicity of children's face data.
We consider ethnicity as a style and compare three different Image-to-Image neural network based methods to implement Caucasian child data and Asian child data conversion.
arXiv Detail & Related papers (2023-08-08T12:54:05Z) - ChildGAN: Large Scale Synthetic Child Facial Data Using Domain
Adaptation in StyleGAN [1.6536018920603175]
ChildGAN is built by performing smooth domain transfer using transfer learning.
The dataset comprises more than 300k distinct data samples.
The results demonstrate that synthetic child facial data of high quality offers an alternative to the cost and complexity of collecting a large-scale dataset from real children.
arXiv Detail & Related papers (2023-07-25T18:04:52Z) - UniDiff: Advancing Vision-Language Models with Generative and
Discriminative Learning [86.91893533388628]
This paper presents UniDiff, a unified multi-modal model that integrates image-text contrastive learning (ITC), text-conditioned image synthesis learning (IS), and reciprocal semantic consistency modeling (RSC)
UniDiff demonstrates versatility in both multi-modal understanding and generative tasks.
arXiv Detail & Related papers (2023-06-01T15:39:38Z) - Child Face Recognition at Scale: Synthetic Data Generation and
Performance Benchmark [3.4110993541168853]
HDA-SynChildFaces consists of 1,652 subjects and a total of 188,832 images, each subject being present at various ages and with many different intra-subject variations.
We evaluate the performance of various facial recognition systems on the generated database and compare the results of adults and children at different ages.
arXiv Detail & Related papers (2023-04-23T15:29:26Z) - Young Labeled Faces in the Wild (YLFW): A Dataset for Children Faces
Recognition [0.0]
We present a benchmark dataset for children's face recognition, which is compiled similarly to the famous face recognition benchmarks LFW, CALFW, CPLFW, XQLFW and AgeDB.
We also present a development dataset for adapting face recognition models for face images of children.
arXiv Detail & Related papers (2023-01-13T22:19:44Z) - CIAO! A Contrastive Adaptation Mechanism for Non-Universal Facial
Expression Recognition [80.07590100872548]
We propose Contrastive Inhibitory Adaptati On (CIAO), a mechanism that adapts the last layer of facial encoders to depict specific affective characteristics on different datasets.
CIAO presents an improvement in facial expression recognition performance over six different datasets with very unique affective representations.
arXiv Detail & Related papers (2022-08-10T15:46:05Z) - FP-Age: Leveraging Face Parsing Attention for Facial Age Estimation in
the Wild [50.8865921538953]
We propose a method to explicitly incorporate facial semantics into age estimation.
We design a face parsing-based network to learn semantic information at different scales.
We show that our method consistently outperforms all existing age estimation methods.
arXiv Detail & Related papers (2021-06-21T14:31:32Z) - Learning to Augment Expressions for Few-shot Fine-grained Facial
Expression Recognition [98.83578105374535]
We present a novel Fine-grained Facial Expression Database - F2ED.
It includes more than 200k images with 54 facial expressions from 119 persons.
Considering the phenomenon of uneven data distribution and lack of samples is common in real-world scenarios, we evaluate several tasks of few-shot expression learning.
We propose a unified task-driven framework - Compositional Generative Adversarial Network (Comp-GAN) learning to synthesize facial images.
arXiv Detail & Related papers (2020-01-17T03:26:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.