Fair for a few: Improving Fairness in Doubly Imbalanced Datasets
- URL: http://arxiv.org/abs/2506.14306v1
- Date: Tue, 17 Jun 2025 08:34:56 GMT
- Title: Fair for a few: Improving Fairness in Doubly Imbalanced Datasets
- Authors: Ata Yalcin, Asli Umay Ozturk, Yigit Sever, Viktoria Pauw, Stephan Hachinger, Ismail Hakki Toroslu, Pinar Karagoz,
- Abstract summary: We focus on fairness in doubly imbalanced datasets, such that the data collection is imbalanced both for the label and the groups in the sensitive attribute.<n> Firstly, we present an exploratory analysis to illustrate limitations in debiasing on a doubly imbalanced dataset.<n>Then, a multi-criteria based solution is proposed for finding the most suitable sampling and distribution for label and sensitive attribute.
- Score: 1.6522111951751468
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fairness has been identified as an important aspect of Machine Learning and Artificial Intelligence solutions for decision making. Recent literature offers a variety of approaches for debiasing, however many of them fall short when the data collection is imbalanced. In this paper, we focus on a particular case, fairness in doubly imbalanced datasets, such that the data collection is imbalanced both for the label and the groups in the sensitive attribute. Firstly, we present an exploratory analysis to illustrate limitations in debiasing on a doubly imbalanced dataset. Then, a multi-criteria based solution is proposed for finding the most suitable sampling and distribution for label and sensitive attribute, in terms of fairness and classification accuracy
Related papers
- Class-Conditional Distribution Balancing for Group Robust Classification [11.525201208566925]
Spurious correlations that lead models to correct predictions for the wrong reasons pose a critical challenge for robust real-world generalization.<n>We offer a novel perspective by reframing the spurious correlations as imbalances or mismatches in class-conditional distributions.<n>We propose a simple yet effective robust learning method that eliminates the need for both bias annotations and predictions.
arXiv Detail & Related papers (2025-04-24T07:15:53Z) - A Survey on Small Sample Imbalance Problem: Metrics, Feature Analysis, and Solutions [41.77642958758829]
The small sample imbalance (S&I) problem is a major challenge in machine learning and data analysis.<n>Existing methods often rely on algorithmics without sufficiently analyzing the underlying data characteristics.<n>We argue that a detailed analysis from the data perspective is essential before developing an appropriate solution.
arXiv Detail & Related papers (2025-04-21T01:58:29Z) - Pedestrian Attribute Recognition as Label-balanced Multi-label Learning [12.605514698358165]
We propose a novel framework that successfully decouples label-balanced data re-sampling from the curse of attributes co-occurrence.
Our work achieves best accuracy on various popular benchmarks, and importantly, with minimal computational budget.
arXiv Detail & Related papers (2024-05-08T07:27:15Z) - Chasing Fairness Under Distribution Shift: A Model Weight Perturbation
Approach [72.19525160912943]
We first theoretically demonstrate the inherent connection between distribution shift, data perturbation, and model weight perturbation.
We then analyze the sufficient conditions to guarantee fairness for the target dataset.
Motivated by these sufficient conditions, we propose robust fairness regularization (RFR)
arXiv Detail & Related papers (2023-03-06T17:19:23Z) - On Comparing Fair Classifiers under Data Bias [42.43344286660331]
We study the effect of varying data biases on the accuracy and fairness of fair classifiers.
Our experiments show how to integrate a measure of data bias risk in the existing fairness dashboards for real-world deployments.
arXiv Detail & Related papers (2023-02-12T13:04:46Z) - Social Bias Meets Data Bias: The Impacts of Labeling and Measurement
Errors on Fairness Criteria [4.048444203617942]
We consider two forms of dataset bias: errors by prior decision makers in the labeling process, and errors in measurement of the features of disadvantaged individuals.
We analytically show that some constraints can remain robust when facing certain statistical biases, while others (such as Equalized Odds) are significantly violated if trained on biased data.
Our findings present an additional guideline for choosing among existing fairness criteria, or for proposing new criteria, when available datasets may be biased.
arXiv Detail & Related papers (2022-05-31T22:43:09Z) - Semi-FairVAE: Semi-supervised Fair Representation Learning with
Adversarial Variational Autoencoder [92.67156911466397]
We propose a semi-supervised fair representation learning approach based on adversarial variational autoencoder.
We use a bias-aware model to capture inherent bias information on sensitive attribute.
We also use a bias-free model to learn debiased fair representations by using adversarial learning to remove bias information from them.
arXiv Detail & Related papers (2022-04-01T15:57:47Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z) - Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce
Discrimination [53.3082498402884]
A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair.
We present a framework of fair semi-supervised learning in the pre-processing phase, including pseudo labeling to predict labels for unlabeled data.
A theoretical decomposition analysis of bias, variance and noise highlights the different sources of discrimination and the impact they have on fairness in semi-supervised learning.
arXiv Detail & Related papers (2020-09-25T05:48:56Z) - Data Augmentation Imbalance For Imbalanced Attribute Classification [60.71438625139922]
We propose a new re-sampling algorithm called: data augmentation imbalance (DAI) to explicitly enhance the ability to discriminate the fewer attributes.
Our DAI algorithm achieves state-of-the-art results, based on pedestrian attribute datasets.
arXiv Detail & Related papers (2020-04-19T20:43:29Z) - Long-Tailed Recognition Using Class-Balanced Experts [128.73438243408393]
We propose an ensemble of class-balanced experts that combines the strength of diverse classifiers.
Our ensemble of class-balanced experts reaches results close to state-of-the-art and an extended ensemble establishes a new state-of-the-art on two benchmarks for long-tailed recognition.
arXiv Detail & Related papers (2020-04-07T20:57:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.