Zero-shot racially balanced dataset generation using an existing biased
StyleGAN2
- URL: http://arxiv.org/abs/2305.07710v2
- Date: Mon, 18 Sep 2023 17:48:17 GMT
- Title: Zero-shot racially balanced dataset generation using an existing biased
StyleGAN2
- Authors: Anubhav Jain, Nasir Memon, Julian Togelius
- Abstract summary: We propose a methodology that leverages the biased generative model StyleGAN2 to create demographically diverse images of synthetic individuals.
By training face recognition models with the resulting balanced dataset containing 50,000 identities per race, we can improve their performance and minimize biases that might have been present in a model trained on a real dataset.
- Score: 5.463417677777276
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Facial recognition systems have made significant strides thanks to data-heavy
deep learning models, but these models rely on large privacy-sensitive
datasets. Further, many of these datasets lack diversity in terms of ethnicity
and demographics, which can lead to biased models that can have serious
societal and security implications. To address these issues, we propose a
methodology that leverages the biased generative model StyleGAN2 to create
demographically diverse images of synthetic individuals. The synthetic dataset
is created using a novel evolutionary search algorithm that targets specific
demographic groups. By training face recognition models with the resulting
balanced dataset containing 50,000 identities per race (13.5 million images in
total), we can improve their performance and minimize biases that might have
been present in a model trained on a real dataset.
Related papers
- Enriching Datasets with Demographics through Large Language Models: What's in a Name? [5.871504332441324]
Large Language Models (LLMs) can perform as well as, if not better than, bespoke models trained on specialized data.
We apply these LLMs to a variety of datasets, including a real-life, unlabelled dataset of licensed financial professionals in Hong Kong.
arXiv Detail & Related papers (2024-09-17T18:40:49Z) - Synthetic Data for the Mitigation of Demographic Biases in Face
Recognition [10.16490522214987]
This study investigates the possibility of mitigating the demographic biases that affect face recognition technologies through the use of synthetic data.
We use synthetic datasets generated with GANDiffFace, a novel framework able to synthesize datasets for face recognition with controllable demographic distribution and realistic intra-class variations.
Our results support the proposed approach and the use of synthetic data to mitigate demographic biases in face recognition.
arXiv Detail & Related papers (2024-02-02T14:57:42Z) - Fast Model Debias with Machine Unlearning [54.32026474971696]
Deep neural networks might behave in a biased manner in many real-world scenarios.
Existing debiasing methods suffer from high costs in bias labeling or model re-training.
We propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases.
arXiv Detail & Related papers (2023-10-19T08:10:57Z) - Toward responsible face datasets: modeling the distribution of a
disentangled latent space for sampling face images from demographic groups [0.0]
Recently, it has been exposed that some modern facial recognition systems could discriminate specific demographic groups.
We propose to use a simple method for modeling and sampling a disentangled projection of a StyleGAN latent space to generate any combination of demographic groups.
Our experiments show that we can synthesis any combination of demographic groups effectively and the identities are different from the original training dataset.
arXiv Detail & Related papers (2023-09-15T14:42:04Z) - TIDE: Textual Identity Detection for Evaluating and Augmenting
Classification and Language Models [0.0]
Machine learning models can perpetuate unintended biases from unfair and imbalanced datasets.
We present a dataset coupled with an approach to improve text fairness in classifiers and language models.
We leverage TIDAL to develop an identity annotation and augmentation tool that can be used to improve the availability of identity context.
arXiv Detail & Related papers (2023-09-07T21:44:42Z) - Balanced Face Dataset: Guiding StyleGAN to Generate Labeled Synthetic
Face Image Dataset for Underrepresented Group [0.0]
Real-world datasets frequently have overrepresented and underrepresented groups.
One solution to mitigate bias in machine learning is to leverage a diverse and representative dataset.
The focus of this study was to generate a robust face image dataset using the StyleGAN model.
arXiv Detail & Related papers (2023-08-07T11:42:50Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Synthetic Model Combination: An Instance-wise Approach to Unsupervised
Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data.
Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z) - SF-PATE: Scalable, Fair, and Private Aggregation of Teacher Ensembles [50.90773979394264]
This paper studies a model that protects the privacy of individuals' sensitive information while also allowing it to learn non-discriminatory predictors.
A key characteristic of the proposed model is to enable the adoption of off-the-selves and non-private fair models to create a privacy-preserving and fair model.
arXiv Detail & Related papers (2022-04-11T14:42:54Z) - Enhancing Facial Data Diversity with Style-based Face Aging [59.984134070735934]
In particular, face datasets are typically biased in terms of attributes such as gender, age, and race.
We propose a novel, generative style-based architecture for data augmentation that captures fine-grained aging patterns.
We show that the proposed method outperforms state-of-the-art algorithms for age transfer.
arXiv Detail & Related papers (2020-06-06T21:53:44Z) - Investigating Bias in Deep Face Analysis: The KANFace Dataset and
Empirical Study [67.3961439193994]
We introduce the most comprehensive, large-scale dataset of facial images and videos to date.
The data are manually annotated in terms of identity, exact age, gender and kinship.
A method to debias network embeddings is introduced and tested on the proposed benchmarks.
arXiv Detail & Related papers (2020-05-15T00:14:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.