On the use of automatically generated synthetic image datasets for
benchmarking face recognition
- URL: http://arxiv.org/abs/2106.04215v1
- Date: Tue, 8 Jun 2021 09:54:02 GMT
- Title: On the use of automatically generated synthetic image datasets for
benchmarking face recognition
- Authors: Laurent Colbois, Tiago de Freitas Pereira and S\'ebastien Marcel
- Abstract summary: Recent advances in Generative Adversarial Networks (GANs) provide a pathway to replace real datasets by synthetic datasets.
Recent advances in Generative Adversarial Networks (GANs) to synthesize realistic face images provide a pathway to replace real datasets by synthetic datasets.
benchmarking results on the synthetic dataset are a good substitution, often providing error rates and system ranking similar to the benchmarking on the real dataset.
- Score: 2.0196229393131726
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The availability of large-scale face datasets has been key in the progress of
face recognition. However, due to licensing issues or copyright infringement,
some datasets are not available anymore (e.g. MS-Celeb-1M). Recent advances in
Generative Adversarial Networks (GANs), to synthesize realistic face images,
provide a pathway to replace real datasets by synthetic datasets, both to train
and benchmark face recognition (FR) systems. The work presented in this paper
provides a study on benchmarking FR systems using a synthetic dataset. First,
we introduce the proposed methodology to generate a synthetic dataset, without
the need for human intervention, by exploiting the latent structure of a
StyleGAN2 model with multiple controlled factors of variation. Then, we confirm
that (i) the generated synthetic identities are not data subjects from the
GAN's training dataset, which is verified on a synthetic dataset with 10K+
identities; (ii) benchmarking results on the synthetic dataset are a good
substitution, often providing error rates and system ranking similar to the
benchmarking on the real dataset.
Related papers
- Unveiling Synthetic Faces: How Synthetic Datasets Can Expose Real Identities [22.8742248559748]
We show that in 6 state-of-the-art synthetic face recognition datasets, several samples from the original real dataset are leaked.
This paper is the first work which shows the leakage from training data of generator models into the generated synthetic face recognition datasets.
arXiv Detail & Related papers (2024-10-31T15:17:14Z) - SDFR: Synthetic Data for Face Recognition Competition [51.9134406629509]
Large-scale face recognition datasets are collected by crawling the Internet and without individuals' consent, raising legal, ethical, and privacy concerns.
Recently several works proposed generating synthetic face recognition datasets to mitigate concerns in web-crawled face recognition datasets.
This paper presents the summary of the Synthetic Data for Face Recognition (SDFR) Competition held in conjunction with the 18th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2024)
The SDFR competition was split into two tasks, allowing participants to train face recognition systems using new synthetic datasets and/or existing ones.
arXiv Detail & Related papers (2024-04-06T10:30:31Z) - Trading Off Scalability, Privacy, and Performance in Data Synthesis [11.698554876505446]
We introduce (a) the Howso engine, and (b) our proposed random projection based synthetic data generation framework.
We show that the synthetic data generated by Howso engine has good privacy and accuracy, which results the best overall score.
Our proposed random projection based framework can generate synthetic data with highest accuracy score, and has the fastest scalability.
arXiv Detail & Related papers (2023-12-09T02:04:25Z) - TarGEN: Targeted Data Generation with Large Language Models [51.87504111286201]
TarGEN is a multi-step prompting strategy for generating high-quality synthetic datasets.
We augment TarGEN with a method known as self-correction empowering LLMs to rectify inaccurately labeled instances.
A comprehensive analysis of the synthetic dataset compared to the original dataset reveals similar or higher levels of dataset complexity and diversity.
arXiv Detail & Related papers (2023-10-27T03:32:17Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - GANDiffFace: Controllable Generation of Synthetic Datasets for Face
Recognition with Realistic Variations [2.7467281625529134]
This study introduces GANDiffFace, a novel framework for the generation of synthetic datasets for face recognition.
GANDiffFace combines the power of Generative Adversarial Networks (GANs) and Diffusion models to overcome the limitations of existing synthetic datasets.
arXiv Detail & Related papers (2023-05-31T15:49:12Z) - Bridging the Gap: Enhancing the Utility of Synthetic Data via
Post-Processing Techniques [7.967995669387532]
generative models have emerged as a promising solution for generating synthetic datasets that can replace or augment real-world data.
We propose three novel post-processing techniques to improve the quality and diversity of the synthetic dataset.
Experiments show that Gap Filler (GaFi) effectively reduces the gap with real-accuracy scores to an error of 2.03%, 1.78%, and 3.99% on the Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets, respectively.
arXiv Detail & Related papers (2023-05-17T10:50:38Z) - Synthetic data, real errors: how (not) to publish and use synthetic data [86.65594304109567]
We show how the generative process affects the downstream ML task.
We introduce Deep Generative Ensemble (DGE) to approximate the posterior distribution over the generative process model parameters.
arXiv Detail & Related papers (2023-05-16T07:30:29Z) - Is synthetic data from generative models ready for image recognition? [69.42645602062024]
We study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks.
We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.
arXiv Detail & Related papers (2022-10-14T06:54:24Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.