Related papers: Fairness Without Labels: Pseudo-Balancing for Bias Mitigation in Face Gender Classification

Fairness Without Labels: Pseudo-Balancing for Bias Mitigation in Face Gender Classification

URL: http://arxiv.org/abs/2510.10191v1
Date: Sat, 11 Oct 2025 12:08:40 GMT
Title: Fairness Without Labels: Pseudo-Balancing for Bias Mitigation in Face Gender Classification
Authors: Haohua Dong, Ana Manzano Rodríguez, Camille Guinaudeau, Shin'ichi Satoh,
Abstract summary: Face gender classification models often reflect and amplify demographic biases present in their training data.<n>We introduce pseudo-balancing, a simple and effective strategy for mitigating such biases in semi-supervised learning.<n>Our method enforces demographic balance during pseudo-label selection, using only unlabeled images from a race-balanced dataset.
Score: 10.66892435479991
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Face gender classification models often reflect and amplify demographic biases present in their training data, leading to uneven performance across gender and racial subgroups. We introduce pseudo-balancing, a simple and effective strategy for mitigating such biases in semi-supervised learning. Our method enforces demographic balance during pseudo-label selection, using only unlabeled images from a race-balanced dataset without requiring access to ground-truth annotations. We evaluate pseudo-balancing under two conditions: (1) fine-tuning a biased gender classifier using unlabeled images from the FairFace dataset, and (2) stress-testing the method with intentionally imbalanced training data to simulate controlled bias scenarios. In both cases, models are evaluated on the All-Age-Faces (AAF) benchmark, which contains a predominantly East Asian population. Our results show that pseudo-balancing consistently improves fairness while preserving or enhancing accuracy. The method achieves 79.81% overall accuracy - a 6.53% improvement over the baseline - and reduces the gender accuracy gap by 44.17%. In the East Asian subgroup, where baseline disparities exceeded 49%, the gap is narrowed to just 5.01%. These findings suggest that even in the absence of label supervision, access to a demographically balanced or moderately skewed unlabeled dataset can serve as a powerful resource for debiasing existing computer vision models.

Related papers

Auditing and Mitigating Bias in Gender Classification Algorithms: A Data-Centric Approach [0.0]
We audit five widely used gender classification datasets, revealing significant intersectional underrepresentation.<n>We train identical MobileNetV2 classifiers on the two most balanced of these datasets, UTKFace and FairFace.<n>Our fairness evaluation shows that even these models exhibit significant bias, misclassifying female faces at a higher rate than male faces.
arXiv Detail & Related papers (2025-10-17T02:09:17Z)
LLM-Guided Synthetic Augmentation (LGSA) for Mitigating Bias in AI Systems [0.24699742392288992]
Underrepresentation of certain groups often leads to uneven performance across demographics.<n>To address these challenges, we propose LLM-Guided Synthetic Augmentation (LGSA)<n>LGSA uses large language models to generate counterfactual examples for underrepresented groups while preserving label integrity.
arXiv Detail & Related papers (2025-10-15T06:42:35Z)
Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models [81.45743826739054]
A major barrier has been the lack of demographic annotations in web-scale datasets such as LAION-400M.<n>We create person-centric annotations for the full dataset, including over 276 million bounding boxes, perceived gender and race/ethnicity labels, and automatically generated captions.<n>Using them, we uncover demographic imbalances and harmful associations, such as the disproportionate linking of men and individuals perceived as Black or Middle Eastern with crime-related and negative content.
arXiv Detail & Related papers (2025-10-04T07:51:59Z)
Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation [116.86965910589775]
We show that even minimal perturbations, such as masking just 10% of objects or weakly blurring backgrounds, can dramatically alter bias scores.<n>This suggests that current bias evaluations reflect model responses to spurious features rather than gender bias.
arXiv Detail & Related papers (2025-09-09T11:14:11Z)
Towards the Mitigation of Confirmation Bias in Semi-supervised Learning: a Debiased Training Perspective [6.164100243945264]
Semi-supervised learning (SSL) commonly exhibits confirmation bias, where models disproportionately favor certain classes. We introduce TaMatch, a unified framework for debiased training in SSL. We show that TaMatch significantly outperforms existing state-of-the-art methods across a range of challenging image classification tasks.
arXiv Detail & Related papers (2024-09-26T21:50:30Z)
On Comparing Fair Classifiers under Data Bias [42.43344286660331]
We study the effect of varying data biases on the accuracy and fairness of fair classifiers. Our experiments show how to integrate a measure of data bias risk in the existing fairness dashboards for real-world deployments.
arXiv Detail & Related papers (2023-02-12T13:04:46Z)
Counter-GAP: Counterfactual Bias Evaluation through Gendered Ambiguous Pronouns [53.62845317039185]
Bias-measuring datasets play a critical role in detecting biased behavior of language models. We propose a novel method to collect diverse, natural, and minimally distant text pairs via counterfactual generation. We show that four pre-trained language models are significantly more inconsistent across different gender groups than within each group.
arXiv Detail & Related papers (2023-02-11T12:11:03Z)
Debiased Learning from Naturally Imbalanced Pseudo-Labels for Zero-Shot and Semi-Supervised Learning [27.770473405635585]
This work studies the bias issue of pseudo-labeling, a natural phenomenon that widely occurs but often overlooked by prior research. We observe heavy long-tailed pseudo-labels when a semi-supervised learning model FixMatch predicts labels on the unlabeled set even though the unlabeled data is curated to be balanced. Without intervention, the training model inherits the bias from the pseudo-labels and end up being sub-optimal.
arXiv Detail & Related papers (2022-01-05T07:40:24Z)
Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race. Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables. This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z)
Balancing Biases and Preserving Privacy on Balanced Faces in the Wild [50.915684171879036]
There are demographic biases present in current facial recognition (FR) models. We introduce our Balanced Faces in the Wild dataset to measure these biases across different ethnic and gender subgroups. We find that relying on a single score threshold to differentiate between genuine and imposters sample pairs leads to suboptimal results. We propose a novel domain adaptation learning scheme that uses facial features extracted from state-of-the-art neural networks.
arXiv Detail & Related papers (2021-03-16T15:05:49Z)
Mitigating Face Recognition Bias via Group Adaptive Classifier [53.15616844833305]
This work aims to learn a fair face representation, where faces of every group could be more equally represented. Our work is able to mitigate face recognition bias across demographic groups while maintaining the competitive accuracy.
arXiv Detail & Related papers (2020-06-13T06:43:37Z)
Post-Comparison Mitigation of Demographic Bias in Face Recognition Using Fair Score Normalization [15.431761867166]
We propose a novel unsupervised fair score normalization approach to reduce the effect of bias in face recognition. Our solution reduces demographic biases by up to 82.7% in the case when gender is considered. In contrast to previous works, our fair normalization approach enhances the overall performance by up to 53.2% at false match rate of 0.001 and up to 82.9% at a false match rate of 0.00001.
arXiv Detail & Related papers (2020-02-10T08:17:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.