Related papers: Auditing and Mitigating Bias in Gender Classification Algorithms: A Data-Centric Approach

Auditing and Mitigating Bias in Gender Classification Algorithms: A Data-Centric Approach

URL: http://arxiv.org/abs/2510.17873v1
Date: Fri, 17 Oct 2025 02:09:17 GMT
Title: Auditing and Mitigating Bias in Gender Classification Algorithms: A Data-Centric Approach
Authors: Tadesse K Bahiru, Natnael Tilahun Sinshaw, Teshager Hailemariam Moges, Dheeraj Kumar Singh,
Abstract summary: We audit five widely used gender classification datasets, revealing significant intersectional underrepresentation.<n>We train identical MobileNetV2 classifiers on the two most balanced of these datasets, UTKFace and FairFace.<n>Our fairness evaluation shows that even these models exhibit significant bias, misclassifying female faces at a higher rate than male faces.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Gender classification systems often inherit and amplify demographic imbalances in their training data. We first audit five widely used gender classification datasets, revealing that all suffer from significant intersectional underrepresentation. To measure the downstream impact of these flaws, we train identical MobileNetV2 classifiers on the two most balanced of these datasets, UTKFace and FairFace. Our fairness evaluation shows that even these models exhibit significant bias, misclassifying female faces at a higher rate than male faces and amplifying existing racial skew. To counter these data-induced biases, we construct BalancedFace, a new public dataset created by blending images from FairFace and UTKFace, supplemented with images from other collections to fill missing demographic gaps. It is engineered to equalize subgroup shares across 189 intersections of age, race, and gender using only real, unedited images. When a standard classifier is trained on BalancedFace, it reduces the maximum True Positive Rate gap across racial subgroups by over 50% and brings the average Disparate Impact score 63% closer to the ideal of 1.0 compared to the next-best dataset, all with a minimal loss of overall accuracy. These results underline the profound value of data-centric interventions and provide an openly available resource for fair gender classification research.

Related papers

Fairness Without Labels: Pseudo-Balancing for Bias Mitigation in Face Gender Classification [10.66892435479991]
Face gender classification models often reflect and amplify demographic biases present in their training data.<n>We introduce pseudo-balancing, a simple and effective strategy for mitigating such biases in semi-supervised learning.<n>Our method enforces demographic balance during pseudo-label selection, using only unlabeled images from a race-balanced dataset.
arXiv Detail & Related papers (2025-10-11T12:08:40Z)
Person-Centric Annotations of LAION-400M: Auditing Bias and Its Transfer to Models [81.45743826739054]
A major barrier has been the lack of demographic annotations in web-scale datasets such as LAION-400M.<n>We create person-centric annotations for the full dataset, including over 276 million bounding boxes, perceived gender and race/ethnicity labels, and automatically generated captions.<n>Using them, we uncover demographic imbalances and harmful associations, such as the disproportionate linking of men and individuals perceived as Black or Middle Eastern with crime-related and negative content.
arXiv Detail & Related papers (2025-10-04T07:51:59Z)
On the "Illusion" of Gender Bias in Face Recognition: Explaining the Fairness Issue Through Non-demographic Attributes [7.602456562464879]
Face recognition systems exhibit significant accuracy differences based on the user's gender.<n>We propose a toolchain to effectively decorrelate and aggregate facial attributes to enable a less-biased gender analysis.<n>Experiments show that the gender gap vanishes when images of male and female subjects share specific attributes.
arXiv Detail & Related papers (2025-01-21T10:21:19Z)
GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models. Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs. Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z)
The Impact of Debiasing on the Performance of Language Models in Downstream Tasks is Underestimated [70.23064111640132]
We compare the impact of debiasing on performance across multiple downstream tasks using a wide-range of benchmark datasets. Experiments show that the effects of debiasing are consistently emphunderestimated across all tasks.
arXiv Detail & Related papers (2023-09-16T20:25:34Z)
Counter-GAP: Counterfactual Bias Evaluation through Gendered Ambiguous Pronouns [53.62845317039185]
Bias-measuring datasets play a critical role in detecting biased behavior of language models. We propose a novel method to collect diverse, natural, and minimally distant text pairs via counterfactual generation. We show that four pre-trained language models are significantly more inconsistent across different gender groups than within each group.
arXiv Detail & Related papers (2023-02-11T12:11:03Z)
Gender Stereotyping Impact in Facial Expression Recognition [1.5340540198612824]
In recent years, machine learning-based models have become the most popular approach to Facial Expression Recognition (FER) In publicly available FER datasets, apparent gender representation is usually mostly balanced, but their representation in the individual label is not. We generate derivative datasets with different amounts of stereotypical bias by altering the gender proportions of certain labels. We observe a discrepancy in the recognition of certain emotions between genders of up to $29 %$ under the worst bias conditions.
arXiv Detail & Related papers (2022-10-11T10:52:23Z)
Balancing Biases and Preserving Privacy on Balanced Faces in the Wild [50.915684171879036]
There are demographic biases present in current facial recognition (FR) models. We introduce our Balanced Faces in the Wild dataset to measure these biases across different ethnic and gender subgroups. We find that relying on a single score threshold to differentiate between genuine and imposters sample pairs leads to suboptimal results. We propose a novel domain adaptation learning scheme that uses facial features extracted from state-of-the-art neural networks.
arXiv Detail & Related papers (2021-03-16T15:05:49Z)
Enhancing Facial Data Diversity with Style-based Face Aging [59.984134070735934]
In particular, face datasets are typically biased in terms of attributes such as gender, age, and race. We propose a novel, generative style-based architecture for data augmentation that captures fine-grained aging patterns. We show that the proposed method outperforms state-of-the-art algorithms for age transfer.
arXiv Detail & Related papers (2020-06-06T21:53:44Z)
Face Recognition: Too Bias, or Not Too Bias? [45.404162391012726]
We reveal critical insights into problems of bias in state-of-the-art facial recognition systems. We show variations in the optimal scoring threshold for face-pairs across different subgroups. We also do a human evaluation to measure the bias in humans, which supports the hypothesis that such bias exists in human perception.
arXiv Detail & Related papers (2020-02-16T01:08:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.