Systematic analysis of the impact of label noise correction on ML
Fairness
- URL: http://arxiv.org/abs/2306.15994v1
- Date: Wed, 28 Jun 2023 08:08:14 GMT
- Title: Systematic analysis of the impact of label noise correction on ML
Fairness
- Authors: I. Oliveira e Silva, C. Soares, I. Sousa, R. Ghani
- Abstract summary: We develop an empirical methodology to evaluate the effectiveness of label noise correction techniques in ensuring the fairness of models trained on biased datasets.
Our results suggest that the Hybrid Label Noise Correction method achieves the best trade-off between predictive performance and fairness.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Arbitrary, inconsistent, or faulty decision-making raises serious concerns,
and preventing unfair models is an increasingly important challenge in Machine
Learning. Data often reflect past discriminatory behavior, and models trained
on such data may reflect bias on sensitive attributes, such as gender, race, or
age. One approach to developing fair models is to preprocess the training data
to remove the underlying biases while preserving the relevant information, for
example, by correcting biased labels. While multiple label noise correction
methods are available, the information about their behavior in identifying
discrimination is very limited. In this work, we develop an empirical
methodology to systematically evaluate the effectiveness of label noise
correction techniques in ensuring the fairness of models trained on biased
datasets. Our methodology involves manipulating the amount of label noise and
can be used with fairness benchmarks but also with standard ML datasets. We
apply the methodology to analyze six label noise correction methods according
to several fairness metrics on standard OpenML datasets. Our results suggest
that the Hybrid Label Noise Correction method achieves the best trade-off
between predictive performance and fairness. Clustering-Based Correction can
reduce discrimination the most, however, at the cost of lower predictive
performance.
Related papers
- Fair-OBNC: Correcting Label Noise for Fairer Datasets [9.427445881721814]
biases in the training data are sometimes related to label noise.
Models trained on such biased data may perpetuate or even aggravate the biases with respect to sensitive information.
We propose Fair-OBNC, a label noise correction method with fairness considerations.
arXiv Detail & Related papers (2024-10-08T17:18:18Z) - Extracting Clean and Balanced Subset for Noisy Long-tailed Classification [66.47809135771698]
We develop a novel pseudo labeling method using class prototypes from the perspective of distribution matching.
By setting a manually-specific probability measure, we can reduce the side-effects of noisy and long-tailed data simultaneously.
Our method can extract this class-balanced subset with clean labels, which brings effective performance gains for long-tailed classification with label noise.
arXiv Detail & Related papers (2024-04-10T07:34:37Z) - Learning with Imbalanced Noisy Data by Preventing Bias in Sample
Selection [82.43311784594384]
Real-world datasets contain not only noisy labels but also class imbalance.
We propose a simple yet effective method to address noisy labels in imbalanced datasets.
arXiv Detail & Related papers (2024-02-17T10:34:53Z) - Combating Label Noise With A General Surrogate Model For Sample
Selection [84.61367781175984]
We propose to leverage the vision-language surrogate model CLIP to filter noisy samples automatically.
We validate the effectiveness of our proposed method on both real-world and synthetic noisy datasets.
arXiv Detail & Related papers (2023-10-16T14:43:27Z) - Quantifying and mitigating the impact of label errors on model disparity
metrics [14.225423850241675]
We study the effect of label error on a model's disparity metrics.
We find that group calibration and other metrics are sensitive to train-time and test-time label error.
We present an approach to estimate the influence of a training input's label on a model's group disparity metric.
arXiv Detail & Related papers (2023-10-04T02:18:45Z) - Simultaneous Improvement of ML Model Fairness and Performance by
Identifying Bias in Data [1.76179873429447]
We propose a data preprocessing technique that can detect instances ascribing a specific kind of bias that should be removed from the dataset before training.
In particular, we claim that in the problem settings where instances exist with similar feature but different labels caused by variation in protected attributes, an inherent bias gets induced in the dataset.
arXiv Detail & Related papers (2022-10-24T13:04:07Z) - Neighborhood Collective Estimation for Noisy Label Identification and
Correction [92.20697827784426]
Learning with noisy labels (LNL) aims at designing strategies to improve model performance and generalization by mitigating the effects of model overfitting to noisy labels.
Recent advances employ the predicted label distributions of individual samples to perform noise verification and noisy label correction, easily giving rise to confirmation bias.
We propose Neighborhood Collective Estimation, in which the predictive reliability of a candidate sample is re-estimated by contrasting it against its feature-space nearest neighbors.
arXiv Detail & Related papers (2022-08-05T14:47:22Z) - Dash: Semi-Supervised Learning with Dynamic Thresholding [72.74339790209531]
We propose a semi-supervised learning (SSL) approach that uses unlabeled examples to train models.
Our proposed approach, Dash, enjoys its adaptivity in terms of unlabeled data selection.
arXiv Detail & Related papers (2021-09-01T23:52:29Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z) - Fair Classification with Group-Dependent Label Noise [6.324366770332667]
This work examines how to train fair classifiers in settings where training labels are corrupted with random noise.
We show that naively imposing parity constraints on demographic disparity measures, without accounting for heterogeneous and group-dependent error rates, can decrease both the accuracy and the fairness of the resulting classifier.
arXiv Detail & Related papers (2020-10-31T22:35:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.