Using Synthetic Images To Uncover Population Biases In Facial Landmarks
Detection
- URL: http://arxiv.org/abs/2111.01683v1
- Date: Mon, 1 Nov 2021 15:42:15 GMT
- Title: Using Synthetic Images To Uncover Population Biases In Facial Landmarks
Detection
- Authors: Ran Shadmi, Jonathan Laserson, Gil Elbaz
- Abstract summary: We show that synthetic test sets can efficiently detect a model's weak spots and overcome limitations of real test set in terms of quantity and/or diversity.
This shows that synthetic test sets can efficiently detect a model's weak spots and overcome limitations of real test set in terms of quantity and/or diversity.
- Score: 0.8594140167290096
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In order to analyze a trained model performance and identify its weak spots,
one has to set aside a portion of the data for testing. The test set has to be
large enough to detect statistically significant biases with respect to all the
relevant sub-groups in the target population. This requirement may be difficult
to satisfy, especially in data-hungry applications. We propose to overcome this
difficulty by generating synthetic test set. We use the face landmarks
detection task to validate our proposal by showing that all the biases observed
on real datasets are also seen on a carefully designed synthetic dataset. This
shows that synthetic test sets can efficiently detect a model's weak spots and
overcome limitations of real test set in terms of quantity and/or diversity.
Related papers
- Bias Analysis for Synthetic Face Detection: A Case Study of the Impact of Facial Attributes [14.594459540658429]
We introduce an evaluation framework to contribute to the analysis of bias of synthetic face detectors with respect to several facial attributes.<n>We build on the proposed framework to provide an extensive case study of the bias level of five state-of-the-art detectors in synthetic datasets with 25 controlled facial attributes.
arXiv Detail & Related papers (2025-07-25T22:49:06Z) - Improving Predictions on Highly Unbalanced Data Using Open Source Synthetic Data Upsampling [0.0]
We show that synthetic data can improve predictive accuracy for minority groups by generating diverse data points that fill gaps in sparse regions of the feature space.<n>We evaluate the effectiveness of an open-source solution, the Synthetic Data SDK by MOSTLY AI, which provides a flexible and user-friendly approach to synthetic upsampling for mixed-type data.
arXiv Detail & Related papers (2025-07-22T10:11:32Z) - A Simple Remedy for Dataset Bias via Self-Influence: A Mislabeled Sample Perspective [33.78421391776591]
In this paper, we propose a novel perspective of mislabeled sample detection.
We show that our new perspective can boost the precision of detection and rectify biased models effectively.
Our approach is complementary to existing methods, showing performance improvement even when applied to models that have already undergone recent debiasing techniques.
arXiv Detail & Related papers (2024-11-01T04:54:32Z) - Investigating the Impact of Hard Samples on Accuracy Reveals In-class Data Imbalance [4.291589126905706]
In the AutoML domain, test accuracy is heralded as the quintessential metric for evaluating model efficacy.
However, the reliability of test accuracy as the primary performance metric has been called into question.
The distribution of hard samples between training and test sets affects the difficulty levels of those sets.
We propose a benchmarking procedure for comparing hard sample identification methods.
arXiv Detail & Related papers (2024-09-22T11:38:14Z) - Cross-Database Liveness Detection: Insights from Comparative Biometric
Analysis [20.821562115822182]
Liveness detection is the capability to differentiate between genuine and spoofed biometric samples.
This research presents a comprehensive evaluation of liveness detection models.
Our work offers a blueprint for navigating the evolving rhythms of biometric security.
arXiv Detail & Related papers (2024-01-29T15:32:18Z) - Image change detection with only a few samples [7.5780621370948635]
A major impediment of image change detection task is the lack of large annotated datasets covering a wide variety of scenes.
We propose using simple image processing methods for generating synthetic but informative datasets.
We then design an early fusion network based on object detection which could outperform the siamese neural network.
arXiv Detail & Related papers (2023-11-07T07:01:35Z) - Can You Rely on Your Model Evaluation? Improving Model Evaluation with
Synthetic Test Data [75.20035991513564]
We introduce 3S Testing, a deep generative modeling framework to facilitate model evaluation.
Our experiments demonstrate that 3S Testing outperforms traditional baselines.
These results raise the question of whether we need a paradigm shift away from limited real test data towards synthetic test data.
arXiv Detail & Related papers (2023-10-25T10:18:44Z) - Wild Face Anti-Spoofing Challenge 2023: Benchmark and Results [73.98594459933008]
Face anti-spoofing (FAS) is an essential mechanism for safeguarding the integrity of automated face recognition systems.
This limitation can be attributed to the scarcity and lack of diversity in publicly available FAS datasets.
We introduce the Wild Face Anti-Spoofing dataset, a large-scale, diverse FAS dataset collected in unconstrained settings.
arXiv Detail & Related papers (2023-04-12T10:29:42Z) - BeCAPTCHA-Type: Biometric Keystroke Data Generation for Improved Bot
Detection [63.447493500066045]
This work proposes a data driven learning model for the synthesis of keystroke biometric data.
The proposed method is compared with two statistical approaches based on Universal and User-dependent models.
Our experimental framework considers a dataset with 136 million keystroke events from 168 thousand subjects.
arXiv Detail & Related papers (2022-07-27T09:26:15Z) - Efficient Test-Time Model Adaptation without Forgetting [60.36499845014649]
Test-time adaptation seeks to tackle potential distribution shifts between training and testing data.
We propose an active sample selection criterion to identify reliable and non-redundant samples.
We also introduce a Fisher regularizer to constrain important model parameters from drastic changes.
arXiv Detail & Related papers (2022-04-06T06:39:40Z) - Semi-supervised Salient Object Detection with Effective Confidence
Estimation [35.0990691497574]
We study semi-supervised salient object detection with access to a small number of labeled samples and a large number of unlabeled samples.
We model the nature of human saliency labels using the latent variable of the Conditional Energy-based Model.
With only 1/16 labeled samples, our model achieves competitive performance compared with state-of-the-art fully-supervised models.
arXiv Detail & Related papers (2021-12-28T07:14:48Z) - Hidden Biases in Unreliable News Detection Datasets [60.71991809782698]
We show that selection bias during data collection leads to undesired artifacts in the datasets.
We observed a significant drop (>10%) in accuracy for all models tested in a clean split with no train/test source overlap.
We suggest future dataset creation include a simple model as a difficulty/bias probe and future model development use a clean non-overlapping site and date split.
arXiv Detail & Related papers (2021-04-20T17:16:41Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.