Related papers: SynFER: Towards Boosting Facial Expression Recognition with Synthetic Data

SynFER: Towards Boosting Facial Expression Recognition with Synthetic Data

URL: http://arxiv.org/abs/2410.09865v2
Date: Wed, 20 Nov 2024 07:38:20 GMT
Title: SynFER: Towards Boosting Facial Expression Recognition with Synthetic Data
Authors: Xilin He, Cheng Luo, Xiaole Xian, Bing Li, Siyang Song, Muhammad Haris Khan, Weicheng Xie, Linlin Shen, Zongyuan Ge,
Abstract summary: We introduce SynFER, a novel framework for synthesizing facial expression image data based on high-level textual descriptions. We propose a semantic guidance technique to steer the generation process and a pseudo-label generator to help rectify the facial expression labels. Our approach achieves a 67.23% classification accuracy on AffectNet when training solely with synthetic data equivalent to the AffectNet training set size.
Score: 44.304022773272415
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Facial expression datasets remain limited in scale due to privacy concerns, the subjectivity of annotations, and the labor-intensive nature of data collection. This limitation poses a significant challenge for developing modern deep learning-based facial expression analysis models, particularly foundation models, that rely on large-scale data for optimal performance. To tackle the overarching and complex challenge, we introduce SynFER (Synthesis of Facial Expressions with Refined Control), a novel framework for synthesizing facial expression image data based on high-level textual descriptions as well as more fine-grained and precise control through facial action units. To ensure the quality and reliability of the synthetic data, we propose a semantic guidance technique to steer the generation process and a pseudo-label generator to help rectify the facial expression labels for the synthetic images. To demonstrate the generation fidelity and the effectiveness of the synthetic data from SynFER, we conduct extensive experiments on representation learning using both synthetic data and real-world data. Experiment results validate the efficacy of the proposed approach and the synthetic data. Notably, our approach achieves a 67.23% classification accuracy on AffectNet when training solely with synthetic data equivalent to the AffectNet training set size, which increases to 69.84% when scaling up to five times the original size. Our code will be made publicly available.

Related papers

Scaling Laws of Synthetic Data for Language Models [132.67350443447611]
We introduce SynthLLM, a scalable framework that transforms pre-training corpora into diverse, high-quality synthetic datasets. Our approach achieves this by automatically extracting and recombining high-level concepts across multiple documents using a graph algorithm.
arXiv Detail & Related papers (2025-03-25T11:07:12Z)
Neuro-Symbolic Scene Graph Conditioning for Synthetic Image Dataset Generation [3.8143958643887]
This paper explores the utility of Neuro-Symbolic conditioning for synthetic image dataset generation. It investigates whether structured symbolic representations in the form of scene graphs can enhance synthetic data quality through explicit encoding of relational constraints. The results demonstrate that Neuro-Symbolic conditioning yields significant improvements of up to +2.59% in standard Recall metrics and +2.83% in No Graph Constraint Recall metrics when used for dataset augmentation.
arXiv Detail & Related papers (2025-03-21T15:26:16Z)
Leveraging Programmatically Generated Synthetic Data for Differentially Private Diffusion Training [4.815212947276105]
Programmatically generated synthetic data has been used in differential private training for classification to avoid privacy leakage. The model trained with synthetic data generates unrealistic random images, raising challenges to adapt synthetic data for generative models. We propose DPSynGen, which leverages generated synthetic data in diffusion models to address this challenge.
arXiv Detail & Related papers (2024-12-13T04:22:23Z)
Second FRCSyn-onGoing: Winning Solutions and Post-Challenge Analysis to Improve Face Recognition with Synthetic Data [104.30479583607918]
2nd FRCSyn-onGoing challenge is based on the 2nd Face Recognition Challenge in the Era of Synthetic Data (FRCSyn), originally launched at CVPR 2024. We focus on exploring the use of synthetic data both individually and in combination with real data to solve current challenges in face recognition.
arXiv Detail & Related papers (2024-12-02T11:12:01Z)
Synthetic Image Learning: Preserving Performance and Preventing Membership Inference Attacks [5.0243930429558885]
This paper introduces Knowledge Recycling (KR), a pipeline designed to optimise the generation and use of synthetic data for training downstream classifiers. At the heart of this pipeline is Generative Knowledge Distillation (GKD), the proposed technique that significantly improves the quality and usefulness of the information. The results show a significant reduction in the performance gap between models trained on real and synthetic data, with models based on synthetic data outperforming those trained on real data in some cases.
arXiv Detail & Related papers (2024-07-22T10:31:07Z)
Synthetic Face Datasets Generation via Latent Space Exploration from Brownian Identity Diffusion [20.352548473293993]
We introduce three complementary algorithms, called Langevin, Dispersion, and DisCo, aimed at generating large synthetic face datasets.<n>With this in hands, we generate several face datasets and benchmark them by training face recognition models, showing that data generated with our method exceeds the performance of previously GAN-based datasets.<n>While diffusion models are shown to memorize training data, we prevent leakage in our new synthetic datasets, paving the way for more responsible synthetic datasets.
arXiv Detail & Related papers (2024-04-30T22:32:02Z)
Best Practices and Lessons Learned on Synthetic Data [83.63271573197026]
The success of AI models relies on the availability of large, diverse, and high-quality datasets. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns.
arXiv Detail & Related papers (2024-04-11T06:34:17Z)
If It's Not Enough, Make It So: Reducing Authentic Data Demand in Face Recognition through Synthetic Faces [16.977459035497162]
Large face datasets are primarily sourced from web-based images, lacking explicit user consent. In this paper, we examine whether and how synthetic face data can be used to train effective face recognition models.
arXiv Detail & Related papers (2024-04-04T15:45:25Z)
Learning from Synthetic Data for Visual Grounding [55.21937116752679]
We show that SynGround can improve the localization capabilities of off-the-shelf vision-and-language models. Data generated with SynGround improves the pointing game accuracy of a pretrained ALBEF and BLIP models by 4.81% and 17.11% absolute percentage points, respectively.
arXiv Detail & Related papers (2024-03-20T17:59:43Z)
Training Robust Deep Physiological Measurement Models with Synthetic Video-based Data [11.31971398273479]
We propose measures to add real-world noise to synthetic physiological signals and corresponding facial videos. Our results show that we were able to reduce the average MAE from 6.9 to 2.0.
arXiv Detail & Related papers (2023-11-09T13:55:45Z)
TarGEN: Targeted Data Generation with Large Language Models [51.87504111286201]
TarGEN is a multi-step prompting strategy for generating high-quality synthetic datasets. We augment TarGEN with a method known as self-correction empowering LLMs to rectify inaccurately labeled instances. A comprehensive analysis of the synthetic dataset compared to the original dataset reveals similar or higher levels of dataset complexity and diversity.
arXiv Detail & Related papers (2023-10-27T03:32:17Z)
Reimagining Synthetic Tabular Data Generation through Data-Centric AI: A Comprehensive Benchmark [56.8042116967334]
Synthetic data serves as an alternative in training machine learning models. ensuring that synthetic data mirrors the complex nuances of real-world data is a challenging task. This paper explores the potential of integrating data-centric AI techniques to guide the synthetic data generation process.
arXiv Detail & Related papers (2023-10-25T20:32:02Z)
SynthDistill: Face Recognition with Knowledge Distillation from Synthetic Data [8.026313049094146]
State-of-the-art face recognition networks are often computationally expensive and cannot be used for mobile applications. We propose a new framework to train lightweight face recognition models by distilling the knowledge of a pretrained teacher face recognition model using synthetic data. We use synthetic face images without identity labels, mitigating the problems in the intra-class variation generation of synthetic datasets.
arXiv Detail & Related papers (2023-08-28T19:15:27Z)
ContraNeRF: Generalizable Neural Radiance Fields for Synthetic-to-real Novel View Synthesis via Contrastive Learning [102.46382882098847]
We first investigate the effects of synthetic data in synthetic-to-real novel view synthesis. We propose to introduce geometry-aware contrastive learning to learn multi-view consistent features with geometric constraints. Our method can render images with higher quality and better fine-grained details, outperforming existing generalizable novel view synthesis methods in terms of PSNR, SSIM, and LPIPS.
arXiv Detail & Related papers (2023-03-20T12:06:14Z)
Delving into High-Quality Synthetic Face Occlusion Segmentation Datasets [83.749895930242]
We propose two techniques for producing high-quality naturalistic synthetic occluded faces. We empirically show the effectiveness and robustness of both methods, even for unseen occlusions. We present two high-resolution real-world occluded face datasets with fine-grained annotations, RealOcc and RealOcc-Wild.
arXiv Detail & Related papers (2022-05-12T17:03:57Z)
On the use of automatically generated synthetic image datasets for benchmarking face recognition [2.0196229393131726]
Recent advances in Generative Adversarial Networks (GANs) provide a pathway to replace real datasets by synthetic datasets. Recent advances in Generative Adversarial Networks (GANs) to synthesize realistic face images provide a pathway to replace real datasets by synthetic datasets. benchmarking results on the synthetic dataset are a good substitution, often providing error rates and system ranking similar to the benchmarking on the real dataset.
arXiv Detail & Related papers (2021-06-08T09:54:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.