UnbiasedNets: A Dataset Diversification Framework for Robustness Bias
Alleviation in Neural Networks
- URL: http://arxiv.org/abs/2302.12538v1
- Date: Fri, 24 Feb 2023 09:49:43 GMT
- Title: UnbiasedNets: A Dataset Diversification Framework for Robustness Bias
Alleviation in Neural Networks
- Authors: Mahum Naseer, Bharath Srinivas Prabakaran, Osman Hasan, Muhammad
Shafique
- Abstract summary: Even the most accurate NNs can be biased toward a specific output classification due to the inherent bias in the available training datasets.
This paper deals with the robustness bias, i.e., the bias exhibited by the trained NN by having a significantly large robustness to noise for a certain output class.
We propose the UnbiasedNets framework, which leverages K-means clustering and the NN's noise tolerance to diversify the given training dataset.
- Score: 11.98126285848966
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Performance of trained neural network (NN) models, in terms of testing
accuracy, has improved remarkably over the past several years, especially with
the advent of deep learning. However, even the most accurate NNs can be biased
toward a specific output classification due to the inherent bias in the
available training datasets, which may propagate to the real-world
implementations. This paper deals with the robustness bias, i.e., the bias
exhibited by the trained NN by having a significantly large robustness to noise
for a certain output class, as compared to the remaining output classes. The
bias is shown to result from imbalanced datasets, i.e., the datasets where all
output classes are not equally represented. Towards this, we propose the
UnbiasedNets framework, which leverages K-means clustering and the NN's noise
tolerance to diversify the given training dataset, even from relatively smaller
datasets. This generates balanced datasets and reduces the bias within the
datasets themselves. To the best of our knowledge, this is the first framework
catering to the robustness bias problem in NNs. We use real-world datasets to
demonstrate the efficacy of the UnbiasedNets for data diversification, in case
of both binary and multi-label classifiers. The results are compared to
well-known tools aimed at generating balanced datasets, and illustrate how
existing works have limited success while addressing the robustness bias. In
contrast, UnbiasedNets provides a notable improvement over existing works,
while even reducing the robustness bias significantly in some cases, as
observed by comparing the NNs trained on the diversified and original datasets.
Related papers
- Skew Probabilistic Neural Networks for Learning from Imbalanced Data [3.7892198600060945]
This paper introduces an imbalanced data-oriented approach using probabilistic neural networks (PNNs) with a skew normal probability kernel.
We show that SkewPNNs substantially outperform state-of-the-art machine learning methods for both balanced and imbalanced datasets in most experimental settings.
arXiv Detail & Related papers (2023-12-10T13:12:55Z) - REST: Enhancing Group Robustness in DNNs through Reweighted Sparse
Training [49.581884130880944]
Deep neural network (DNN) has been proven effective in various domains.
However, they often struggle to perform well on certain minority groups during inference.
arXiv Detail & Related papers (2023-12-05T16:27:54Z) - Interpreting Bias in the Neural Networks: A Peek Into Representational
Similarity [0.0]
We investigate the performance and internal representational structure of convolution-based neural networks trained on biased data.
We specifically study similarities in representations, using Centered Kernel Alignment (CKA) for different objective functions.
We note that without progressive representational similarities among the layers of a neural network, the performance is less likely to be robust.
arXiv Detail & Related papers (2022-11-14T22:17:14Z) - Unbiased Supervised Contrastive Learning [10.728852691100338]
In this work, we tackle the problem of learning representations that are robust to biases.
We first present a margin-based theoretical framework that allows us to clarify why recent contrastive losses can fail when dealing with biased data.
We derive a novel formulation of the supervised contrastive loss (epsilon-SupInfoNCE), providing more accurate control of the minimal distance between positive and negative samples.
Thanks to our theoretical framework, we also propose FairKL, a new debiasing regularization loss, that works well even with extremely biased data.
arXiv Detail & Related papers (2022-11-10T13:44:57Z) - Effective Class-Imbalance learning based on SMOTE and Convolutional
Neural Networks [0.1074267520911262]
Imbalanced Data (ID) is a problem that deters Machine Learning (ML) models for achieving satisfactory results.
In this paper, we investigate the effectiveness of methods based on Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs)
In order to achieve reliable results, we conducted our experiments 100 times with randomly shuffled data distributions.
arXiv Detail & Related papers (2022-09-01T07:42:16Z) - Does Data Repair Lead to Fair Models? Curating Contextually Fair Data To
Reduce Model Bias [10.639605996067534]
Contextual information is a valuable cue for Deep Neural Networks (DNNs) to learn better representations and improve accuracy.
In COCO, many object categories have a much higher co-occurrence with men compared to women, which can bias a DNN's prediction in favor of men.
We introduce a data repair algorithm using the coefficient of variation, which can curate fair and contextually balanced data for a protected class.
arXiv Detail & Related papers (2021-10-20T06:00:03Z) - Shift-Robust GNNs: Overcoming the Limitations of Localized Graph
Training data [52.771780951404565]
Shift-Robust GNN (SR-GNN) is designed to account for distributional differences between biased training data and the graph's true inference distribution.
We show that SR-GNN outperforms other GNN baselines by accuracy, eliminating at least (40%) of the negative effects introduced by biased training data.
arXiv Detail & Related papers (2021-08-02T18:00:38Z) - Class Balancing GAN with a Classifier in the Loop [58.29090045399214]
We introduce a novel theoretically motivated Class Balancing regularizer for training GANs.
Our regularizer makes use of the knowledge from a pre-trained classifier to ensure balanced learning of all the classes in the dataset.
We demonstrate the utility of our regularizer in learning representations for long-tailed distributions via achieving better performance than existing approaches over multiple datasets.
arXiv Detail & Related papers (2021-06-17T11:41:30Z) - Learning from Failure: Training Debiased Classifier from Biased
Classifier [76.52804102765931]
We show that neural networks learn to rely on spurious correlation only when it is "easier" to learn than the desired knowledge.
We propose a failure-based debiasing scheme by training a pair of neural networks simultaneously.
Our method significantly improves the training of the network against various types of biases in both synthetic and real-world datasets.
arXiv Detail & Related papers (2020-07-06T07:20:29Z) - Temporal Calibrated Regularization for Robust Noisy Label Learning [60.90967240168525]
Deep neural networks (DNNs) exhibit great success on many tasks with the help of large-scale well annotated datasets.
However, labeling large-scale data can be very costly and error-prone so that it is difficult to guarantee the annotation quality.
We propose a Temporal Calibrated Regularization (TCR) in which we utilize the original labels and the predictions in the previous epoch together.
arXiv Detail & Related papers (2020-07-01T04:48:49Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.