Beyond Real Faces: Synthetic Datasets Can Achieve Reliable Recognition Performance without Privacy Compromise
- URL: http://arxiv.org/abs/2510.17372v1
- Date: Mon, 20 Oct 2025 10:08:53 GMT
- Title: Beyond Real Faces: Synthetic Datasets Can Achieve Reliable Recognition Performance without Privacy Compromise
- Authors: Paweł Borsukiewicz, Fadi Boutros, Iyiola E. Olatunji, Charles Beumier, Wendkûuni C. Ouedraogo, Jacques Klein, Tegawendé F. Bissyandé,
- Abstract summary: We present a systematic literature review identifying 25 synthetic facial recognition datasets.<n>Our methodology examines seven key requirements for privacy-preserving synthetic data.<n>Best-performing synthetic datasets (Face, VIGFace) achieve recognition accuracies of 95.67% and 94.91% respectively.
- Score: 14.844999047343464
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The deployment of facial recognition systems has created an ethical dilemma: achieving high accuracy requires massive datasets of real faces collected without consent, leading to dataset retractions and potential legal liabilities under regulations like GDPR. While synthetic facial data presents a promising privacy-preserving alternative, the field lacks comprehensive empirical evidence of its viability. This study addresses this critical gap through extensive evaluation of synthetic facial recognition datasets. We present a systematic literature review identifying 25 synthetic facial recognition datasets (2018-2025), combined with rigorous experimental validation. Our methodology examines seven key requirements for privacy-preserving synthetic data: identity leakage prevention, intra-class variability, identity separability, dataset scale, ethical data sourcing, bias mitigation, and benchmark reliability. Through experiments involving over 10 million synthetic samples, extended by a comparison of results reported on five standard benchmarks, we provide the first comprehensive empirical assessment of synthetic data's capability to replace real datasets. Best-performing synthetic datasets (VariFace, VIGFace) achieve recognition accuracies of 95.67% and 94.91% respectively, surpassing established real datasets including CASIA-WebFace (94.70%). While those images remain private, publicly available alternatives Vec2Face (93.52%) and CemiFace (93.22%) come close behind. Our findings reveal that they ensure proper intra-class variability while maintaining identity separability. Demographic bias analysis shows that, even though synthetic data inherits limited biases, it offers unprecedented control for bias mitigation through generation parameters. These results establish synthetic facial data as a scientifically viable and ethically imperative alternative for facial recognition research.
Related papers
- SCHIGAND: A Synthetic Facial Generation Mode Pipeline [0.0]
This paper presents SCHIGAND, a novel synthetic face generation pipeline to produce highly realistic and controllable facial datasets.<n>SchIGAND enhances identity preservation while generating realistic intra-class variations and maintaining inter-class distinctiveness.<n>The generated datasets were evaluated using ArcFace, a leading facial verification model, to assess their effectiveness in comparison to real-world facial datasets.
arXiv Detail & Related papers (2026-01-23T10:30:58Z) - A Comparative Study on Synthetic Facial Data Generation Techniques for Face Recognition [1.5515194949246]
This study compares the effectiveness of synthetic facial datasets generated using different techniques in facial recognition tasks.<n>Results demonstrate the ability of synthetic data to capture realistic variations while emphasizing the need for further research to close the performance gap with real data.
arXiv Detail & Related papers (2025-12-05T18:11:29Z) - Second FRCSyn-onGoing: Winning Solutions and Post-Challenge Analysis to Improve Face Recognition with Synthetic Data [104.30479583607918]
2nd FRCSyn-onGoing challenge is based on the 2nd Face Recognition Challenge in the Era of Synthetic Data (FRCSyn), originally launched at CVPR 2024.<n>We focus on exploring the use of synthetic data both individually and in combination with real data to solve current challenges in face recognition.
arXiv Detail & Related papers (2024-12-02T11:12:01Z) - SynFER: Towards Boosting Facial Expression Recognition with Synthetic Data [78.70620682374624]
We introduce SynFER, a novel framework for synthesizing facial expression image data based on high-level textual descriptions.<n>To ensure the quality and reliability of the synthetic data, we propose a semantic guidance technique and a pseudo-label generator.<n>Results validate the efficacy of our approach and the synthetic data.
arXiv Detail & Related papers (2024-10-13T14:58:21Z) - SIG: A Synthetic Identity Generation Pipeline for Generating Evaluation Datasets for Face Recognition [0.0]
We introduce the Synthetic Identity Generation pipeline, or SIG, that allows for the targeted creation of ethical, balanced datasets for face recognition evaluation.
Our pipeline generates high-quality images of synthetic identities with controllable pose, facial features, and demographic attributes, such as race, gender, and age.
We also release an open-source evaluation dataset named ControlFace10k, consisting of 10,008 face images of 3,336 unique synthetic identities balanced across race, gender, and age.
arXiv Detail & Related papers (2024-09-12T18:18:02Z) - Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data [104.45155847778584]
This paper presents an overview of the 2nd edition of the Face Recognition Challenge in the Era of Synthetic Data (FRCSyn)
FRCSyn aims to investigate the use of synthetic data in face recognition to address current technological limitations.
arXiv Detail & Related papers (2024-04-16T08:15:10Z) - SDFR: Synthetic Data for Face Recognition Competition [51.9134406629509]
Large-scale face recognition datasets are collected by crawling the Internet and without individuals' consent, raising legal, ethical, and privacy concerns.
Recently several works proposed generating synthetic face recognition datasets to mitigate concerns in web-crawled face recognition datasets.
This paper presents the summary of the Synthetic Data for Face Recognition (SDFR) Competition held in conjunction with the 18th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2024)
The SDFR competition was split into two tasks, allowing participants to train face recognition systems using new synthetic datasets and/or existing ones.
arXiv Detail & Related papers (2024-04-06T10:30:31Z) - If It's Not Enough, Make It So: Reducing Authentic Data Demand in Face Recognition through Synthetic Faces [16.977459035497162]
Large face datasets are primarily sourced from web-based images, lacking explicit user consent.
In this paper, we examine whether and how synthetic face data can be used to train effective face recognition models.
arXiv Detail & Related papers (2024-04-04T15:45:25Z) - IDiff-Face: Synthetic-based Face Recognition through Fizzy
Identity-Conditioned Diffusion Models [15.217324893166579]
Synthetic datasets have emerged as a promising alternative to privacy-sensitive authentic data for face recognition development.
IDiff-Face is a novel approach based on conditional latent diffusion models for synthetic identity generation with realistic identity variations for face recognition training.
arXiv Detail & Related papers (2023-08-09T14:48:31Z) - Delving into High-Quality Synthetic Face Occlusion Segmentation Datasets [83.749895930242]
We propose two techniques for producing high-quality naturalistic synthetic occluded faces.
We empirically show the effectiveness and robustness of both methods, even for unseen occlusions.
We present two high-resolution real-world occluded face datasets with fine-grained annotations, RealOcc and RealOcc-Wild.
arXiv Detail & Related papers (2022-05-12T17:03:57Z) - SynFace: Face Recognition with Synthetic Data [83.15838126703719]
We devise the SynFace with identity mixup (IM) and domain mixup (DM) to mitigate the performance gap.
We also perform a systematically empirical analysis on synthetic face images to provide some insights on how to effectively utilize synthetic data for face recognition.
arXiv Detail & Related papers (2021-08-18T03:41:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.