Fair-OBNC: Correcting Label Noise for Fairer Datasets
- URL: http://arxiv.org/abs/2410.06214v2
- Date: Mon, 14 Oct 2024 15:45:12 GMT
- Title: Fair-OBNC: Correcting Label Noise for Fairer Datasets
- Authors: Inês Oliveira e Silva, Sérgio Jesus, Hugo Ferreira, Pedro Saleiro, Inês Sousa, Pedro Bizarro, Carlos Soares,
- Abstract summary: biases in the training data are sometimes related to label noise.
Models trained on such biased data may perpetuate or even aggravate the biases with respect to sensitive information.
We propose Fair-OBNC, a label noise correction method with fairness considerations.
- Score: 9.427445881721814
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data used by automated decision-making systems, such as Machine Learning models, often reflects discriminatory behavior that occurred in the past. These biases in the training data are sometimes related to label noise, such as in COMPAS, where more African-American offenders are wrongly labeled as having a higher risk of recidivism when compared to their White counterparts. Models trained on such biased data may perpetuate or even aggravate the biases with respect to sensitive information, such as gender, race, or age. However, while multiple label noise correction approaches are available in the literature, these focus on model performance exclusively. In this work, we propose Fair-OBNC, a label noise correction method with fairness considerations, to produce training datasets with measurable demographic parity. The presented method adapts Ordering-Based Noise Correction, with an adjusted criterion of ordering, based both on the margin of error of an ensemble, and the potential increase in the observed demographic parity of the dataset. We evaluate Fair-OBNC against other different pre-processing techniques, under different scenarios of controlled label noise. Our results show that the proposed method is the overall better alternative within the pool of label correction methods, being capable of attaining better reconstructions of the original labels. Models trained in the corrected data have an increase, on average, of 150% in demographic parity, when compared to models trained in data with noisy labels, across the considered levels of label noise.
Related papers
- Mitigating Matching Biases Through Score Calibration [1.5530839016602822]
Biased outcomes in record matching can result in unequal error rates across demographic groups, raising ethical and legal concerns.
In this paper, we adapt fairness metrics traditionally applied in regression models to evaluate cumulative bias across all thresholds in record matching.
We propose a novel post-processing calibration method, leveraging optimal transport theory and Wasserstein barycenters, to balance matching scores across demographic groups.
arXiv Detail & Related papers (2024-11-03T21:01:40Z) - Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Learning with Imbalanced Noisy Data by Preventing Bias in Sample
Selection [82.43311784594384]
Real-world datasets contain not only noisy labels but also class imbalance.
We propose a simple yet effective method to address noisy labels in imbalanced datasets.
arXiv Detail & Related papers (2024-02-17T10:34:53Z) - Soft Curriculum for Learning Conditional GANs with Noisy-Labeled and
Uncurated Unlabeled Data [70.25049762295193]
We introduce a novel conditional image generation framework that accepts noisy-labeled and uncurated data during training.
We propose soft curriculum learning, which assigns instance-wise weights for adversarial training while assigning new labels for unlabeled data.
Our experiments show that our approach outperforms existing semi-supervised and label-noise robust methods in terms of both quantitative and qualitative performance.
arXiv Detail & Related papers (2023-07-17T08:31:59Z) - Systematic analysis of the impact of label noise correction on ML
Fairness [0.0]
We develop an empirical methodology to evaluate the effectiveness of label noise correction techniques in ensuring the fairness of models trained on biased datasets.
Our results suggest that the Hybrid Label Noise Correction method achieves the best trade-off between predictive performance and fairness.
arXiv Detail & Related papers (2023-06-28T08:08:14Z) - On the Impact of Data Quality on Image Classification Fairness [11.329873246415797]
We measure key fairness metrics across a range of algorithms over multiple image classification datasets.
We describe noise in the labels as inaccuracies in the labelling of the data in the training set and noise in the data as distortions in the data.
By adding noise to the original datasets, we can explore the relationship between the quality of the training data and the fairness of the output of the models trained on that data.
arXiv Detail & Related papers (2023-05-02T16:54:23Z) - Neighborhood Collective Estimation for Noisy Label Identification and
Correction [92.20697827784426]
Learning with noisy labels (LNL) aims at designing strategies to improve model performance and generalization by mitigating the effects of model overfitting to noisy labels.
Recent advances employ the predicted label distributions of individual samples to perform noise verification and noisy label correction, easily giving rise to confirmation bias.
We propose Neighborhood Collective Estimation, in which the predictive reliability of a candidate sample is re-estimated by contrasting it against its feature-space nearest neighbors.
arXiv Detail & Related papers (2022-08-05T14:47:22Z) - Bias-Tolerant Fair Classification [20.973916494320246]
label bias and selection bias are two reasons in data that will hinder the fairness of machine-learning outcomes.
We propose a Bias-TolerantFAirRegularizedLoss (B-FARL) which tries to regain the benefits using data affected by label bias and selection bias.
B-FARL takes the biased data as input, calls a model that approximates the one trained with fair but latent data, and thus prevents discrimination without constraints required.
arXiv Detail & Related papers (2021-07-07T13:31:38Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z) - Fair Classification with Group-Dependent Label Noise [6.324366770332667]
This work examines how to train fair classifiers in settings where training labels are corrupted with random noise.
We show that naively imposing parity constraints on demographic disparity measures, without accounting for heterogeneous and group-dependent error rates, can decrease both the accuracy and the fairness of the resulting classifier.
arXiv Detail & Related papers (2020-10-31T22:35:01Z) - Learning with Out-of-Distribution Data for Audio Classification [60.48251022280506]
We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning.
The proposed method is shown to improve the performance of convolutional neural networks by a significant margin.
arXiv Detail & Related papers (2020-02-11T21:08:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.