VariFace: Fair and Diverse Synthetic Dataset Generation for Face Recognition
- URL: http://arxiv.org/abs/2412.06235v1
- Date: Mon, 09 Dec 2024 06:21:11 GMT
- Title: VariFace: Fair and Diverse Synthetic Dataset Generation for Face Recognition
- Authors: Michael Yeung, Toya Teramoto, Songtao Wu, Tatsuo Fujiwara, Kenji Suzuki, Tamaki Kojima,
- Abstract summary: VariFace is a two-stage diffusion-based pipeline to create fair and diverse synthetic face datasets to train face recognition models.
When constrained to the same dataset size, VariFace considerably outperforms previous synthetic datasets.
For the first time, VariFace outperforms the real dataset (CASIA-WebFace) across six evaluation datasets.
- Score: 4.409387706050884
- License:
- Abstract: The use of large-scale, web-scraped datasets to train face recognition models has raised significant privacy and bias concerns. Synthetic methods mitigate these concerns and provide scalable and controllable face generation to enable fair and accurate face recognition. However, existing synthetic datasets display limited intraclass and interclass diversity and do not match the face recognition performance obtained using real datasets. Here, we propose VariFace, a two-stage diffusion-based pipeline to create fair and diverse synthetic face datasets to train face recognition models. Specifically, we introduce three methods: Face Recognition Consistency to refine demographic labels, Face Vendi Score Guidance to improve interclass diversity, and Divergence Score Conditioning to balance the identity preservation-intraclass diversity trade-off. When constrained to the same dataset size, VariFace considerably outperforms previous synthetic datasets (0.9200 $\rightarrow$ 0.9405) and achieves comparable performance to face recognition models trained with real data (Real Gap = -0.0065). In an unconstrained setting, VariFace not only consistently achieves better performance compared to previous synthetic methods across dataset sizes but also, for the first time, outperforms the real dataset (CASIA-WebFace) across six evaluation datasets. This sets a new state-of-the-art performance with an average face verification accuracy of 0.9567 (Real Gap = +0.0097) across LFW, CFP-FP, CPLFW, AgeDB, and CALFW datasets and 0.9366 (Real Gap = +0.0380) on the RFW dataset.
Related papers
- HyperFace: Generating Synthetic Face Recognition Datasets by Exploring Face Embedding Hypersphere [22.8742248559748]
Face recognition datasets are often collected by crawling Internet and without individuals' consents, raising ethical and privacy concerns.
Generating synthetic datasets for training face recognition models has emerged as a promising alternative.
We propose a new synthetic dataset generation approach, called HyperFace.
arXiv Detail & Related papers (2024-11-13T09:42:12Z) - ID$^3$: Identity-Preserving-yet-Diversified Diffusion Models for Synthetic Face Recognition [60.15830516741776]
Synthetic face recognition (SFR) aims to generate datasets that mimic the distribution of real face data.
We introduce a diffusion-fueled SFR model termed $textID3$.
$textID3$ employs an ID-preserving loss to generate diverse yet identity-consistent facial appearances.
arXiv Detail & Related papers (2024-09-26T06:46:40Z) - TCDiff: Triple Condition Diffusion Model with 3D Constraints for Stylizing Synthetic Faces [1.7535229154829601]
Face recognition experiments using 1k, 2k, and 5k classes of our new dataset for training outperform state-of-the-art synthetic datasets in real face benchmarks.
arXiv Detail & Related papers (2024-09-05T14:59:41Z) - SDFR: Synthetic Data for Face Recognition Competition [51.9134406629509]
Large-scale face recognition datasets are collected by crawling the Internet and without individuals' consent, raising legal, ethical, and privacy concerns.
Recently several works proposed generating synthetic face recognition datasets to mitigate concerns in web-crawled face recognition datasets.
This paper presents the summary of the Synthetic Data for Face Recognition (SDFR) Competition held in conjunction with the 18th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2024)
The SDFR competition was split into two tasks, allowing participants to train face recognition systems using new synthetic datasets and/or existing ones.
arXiv Detail & Related papers (2024-04-06T10:30:31Z) - SwinFace: A Multi-task Transformer for Face Recognition, Expression
Recognition, Age Estimation and Attribute Estimation [60.94239810407917]
This paper presents a multi-purpose algorithm for simultaneous face recognition, facial expression recognition, age estimation, and face attribute estimation based on a single Swin Transformer.
To address the conflicts among multiple tasks, a Multi-Level Channel Attention (MLCA) module is integrated into each task-specific analysis.
Experiments show that the proposed model has a better understanding of the face and achieves excellent performance for all tasks.
arXiv Detail & Related papers (2023-08-22T15:38:39Z) - GANDiffFace: Controllable Generation of Synthetic Datasets for Face
Recognition with Realistic Variations [2.7467281625529134]
This study introduces GANDiffFace, a novel framework for the generation of synthetic datasets for face recognition.
GANDiffFace combines the power of Generative Adversarial Networks (GANs) and Diffusion models to overcome the limitations of existing synthetic datasets.
arXiv Detail & Related papers (2023-05-31T15:49:12Z) - Face Recognition Using Synthetic Face Data [0.0]
We highlight the promising application of synthetic data, generated through rendering digital faces via our computer graphics pipeline, in achieving competitive results.
By finetuning the model,we obtain results that rival those achieved when training with hundreds of thousands of real images.
We also investigate the contribution of adding intra-class variance factors (e.g., makeup, accessories, haircuts) on model performance.
arXiv Detail & Related papers (2023-05-17T09:26:10Z) - SFace: Privacy-friendly and Accurate Face Recognition using Synthetic
Data [9.249824128880707]
We propose and investigate the feasibility of using a privacy-friendly synthetically generated face dataset to train face recognition models.
To address the privacy aspect of using such data to train a face recognition model, we provide extensive evaluation experiments on the identity relation between the synthetic dataset and the original authentic dataset used to train the generative model.
We also propose to train face recognition on our privacy-friendly dataset, SFace, using three different learning strategies, multi-class classification, label-free knowledge transfer, and combined learning of multi-class classification and knowledge transfer.
arXiv Detail & Related papers (2022-06-21T16:42:04Z) - Blind Face Restoration: Benchmark Datasets and a Baseline Model [63.053331687284064]
Blind Face Restoration (BFR) aims to construct a high-quality (HQ) face image from its corresponding low-quality (LQ) input.
We first synthesize two blind face restoration benchmark datasets called EDFace-Celeb-1M (BFR128) and EDFace-Celeb-150K (BFR512)
State-of-the-art methods are benchmarked on them under five settings including blur, noise, low resolution, JPEG compression artifacts, and the combination of them (full degradation)
arXiv Detail & Related papers (2022-06-08T06:34:24Z) - SynFace: Face Recognition with Synthetic Data [83.15838126703719]
We devise the SynFace with identity mixup (IM) and domain mixup (DM) to mitigate the performance gap.
We also perform a systematically empirical analysis on synthetic face images to provide some insights on how to effectively utilize synthetic data for face recognition.
arXiv Detail & Related papers (2021-08-18T03:41:54Z) - Boosting Unconstrained Face Recognition with Auxiliary Unlabeled Data [59.85605718477639]
We present an approach to use unlabeled faces to learn generalizable face representations.
Experimental results on unconstrained datasets show that a small amount of unlabeled data with sufficient diversity can lead to an appreciable gain in recognition performance.
arXiv Detail & Related papers (2020-03-17T20:58:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.