Unveiling Synthetic Faces: How Synthetic Datasets Can Expose Real Identities
- URL: http://arxiv.org/abs/2410.24015v1
- Date: Thu, 31 Oct 2024 15:17:14 GMT
- Title: Unveiling Synthetic Faces: How Synthetic Datasets Can Expose Real Identities
- Authors: Hatef Otroshi Shahreza, Sébastien Marcel,
- Abstract summary: We show that in 6 state-of-the-art synthetic face recognition datasets, several samples from the original real dataset are leaked.
This paper is the first work which shows the leakage from training data of generator models into the generated synthetic face recognition datasets.
- Score: 22.8742248559748
- License:
- Abstract: Synthetic data generation is gaining increasing popularity in different computer vision applications. Existing state-of-the-art face recognition models are trained using large-scale face datasets, which are crawled from the Internet and raise privacy and ethical concerns. To address such concerns, several works have proposed generating synthetic face datasets to train face recognition models. However, these methods depend on generative models, which are trained on real face images. In this work, we design a simple yet effective membership inference attack to systematically study if any of the existing synthetic face recognition datasets leak any information from the real data used to train the generator model. We provide an extensive study on 6 state-of-the-art synthetic face recognition datasets, and show that in all these synthetic datasets, several samples from the original real dataset are leaked. To our knowledge, this paper is the first work which shows the leakage from training data of generator models into the generated synthetic face recognition datasets. Our study demonstrates privacy pitfalls in synthetic face recognition datasets and paves the way for future studies on generating responsible synthetic face datasets.
Related papers
- HyperFace: Generating Synthetic Face Recognition Datasets by Exploring Face Embedding Hypersphere [22.8742248559748]
Face recognition datasets are often collected by crawling Internet and without individuals' consents, raising ethical and privacy concerns.
Generating synthetic datasets for training face recognition models has emerged as a promising alternative.
We propose a new synthetic dataset generation approach, called HyperFace.
arXiv Detail & Related papers (2024-11-13T09:42:12Z) - Digi2Real: Bridging the Realism Gap in Synthetic Data Face Recognition via Foundation Models [4.910937238451485]
We introduce a novel framework for realism transfer aimed at enhancing the realism of synthetically generated face images.
By integrating the controllable aspects of the graphics pipeline with our realism enhancement technique, we generate a large amount of realistic variations.
arXiv Detail & Related papers (2024-11-04T15:42:22Z) - Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data [104.45155847778584]
This paper presents an overview of the 2nd edition of the Face Recognition Challenge in the Era of Synthetic Data (FRCSyn)
FRCSyn aims to investigate the use of synthetic data in face recognition to address current technological limitations.
arXiv Detail & Related papers (2024-04-16T08:15:10Z) - SDFR: Synthetic Data for Face Recognition Competition [51.9134406629509]
Large-scale face recognition datasets are collected by crawling the Internet and without individuals' consent, raising legal, ethical, and privacy concerns.
Recently several works proposed generating synthetic face recognition datasets to mitigate concerns in web-crawled face recognition datasets.
This paper presents the summary of the Synthetic Data for Face Recognition (SDFR) Competition held in conjunction with the 18th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2024)
The SDFR competition was split into two tasks, allowing participants to train face recognition systems using new synthetic datasets and/or existing ones.
arXiv Detail & Related papers (2024-04-06T10:30:31Z) - Learning Human Action Recognition Representations Without Real Humans [66.61527869763819]
We present a benchmark that leverages real-world videos with humans removed and synthetic data containing virtual humans to pre-train a model.
We then evaluate the transferability of the representation learned on this data to a diverse set of downstream action recognition benchmarks.
Our approach outperforms previous baselines by up to 5%.
arXiv Detail & Related papers (2023-11-10T18:38:14Z) - Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A
Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models.
ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task.
This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z) - SynthDistill: Face Recognition with Knowledge Distillation from
Synthetic Data [8.026313049094146]
State-of-the-art face recognition networks are often computationally expensive and cannot be used for mobile applications.
We propose a new framework to train lightweight face recognition models by distilling the knowledge of a pretrained teacher face recognition model using synthetic data.
We use synthetic face images without identity labels, mitigating the problems in the intra-class variation generation of synthetic datasets.
arXiv Detail & Related papers (2023-08-28T19:15:27Z) - Is synthetic data from generative models ready for image recognition? [69.42645602062024]
We study whether and how synthetic images generated from state-of-the-art text-to-image generation models can be used for image recognition tasks.
We showcase the powerfulness and shortcomings of synthetic data from existing generative models, and propose strategies for better applying synthetic data for recognition tasks.
arXiv Detail & Related papers (2022-10-14T06:54:24Z) - SynFace: Face Recognition with Synthetic Data [83.15838126703719]
We devise the SynFace with identity mixup (IM) and domain mixup (DM) to mitigate the performance gap.
We also perform a systematically empirical analysis on synthetic face images to provide some insights on how to effectively utilize synthetic data for face recognition.
arXiv Detail & Related papers (2021-08-18T03:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.