Are fairness metric scores enough to assess discrimination biases in
machine learning?
- URL: http://arxiv.org/abs/2306.05307v1
- Date: Thu, 8 Jun 2023 15:56:57 GMT
- Title: Are fairness metric scores enough to assess discrimination biases in
machine learning?
- Authors: Fanny Jourdan, Laurent Risser, Jean-Michel Loubes, Nicholas Asher
- Abstract summary: We focus on the Bios dataset, and our learning task is to predict the occupation of individuals, based on their biography.
We address an important limitation of theoretical discussions dealing with group-wise fairness metrics: they focus on large datasets.
We then question how reliable are different popular measures of bias when the size of the training set is simply sufficient to learn reasonably accurate predictions.
- Score: 4.073786857780967
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents novel experiments shedding light on the shortcomings of
current metrics for assessing biases of gender discrimination made by machine
learning algorithms on textual data. We focus on the Bios dataset, and our
learning task is to predict the occupation of individuals, based on their
biography. Such prediction tasks are common in commercial Natural Language
Processing (NLP) applications such as automatic job recommendations. We address
an important limitation of theoretical discussions dealing with group-wise
fairness metrics: they focus on large datasets, although the norm in many
industrial NLP applications is to use small to reasonably large linguistic
datasets for which the main practical constraint is to get a good prediction
accuracy. We then question how reliable are different popular measures of bias
when the size of the training set is simply sufficient to learn reasonably
accurate predictions. Our experiments sample the Bios dataset and learn more
than 200 models on different sample sizes. This allows us to statistically
study our results and to confirm that common gender bias indices provide
diverging and sometimes unreliable results when applied to relatively small
training and test samples. This highlights the crucial importance of variance
calculations for providing sound results in this field.
Related papers
- Using Large Language Models for Expert Prior Elicitation in Predictive Modelling [53.54623137152208]
This study proposes using large language models (LLMs) to elicit expert prior distributions for predictive models.
We compare LLM-elicited and uninformative priors, evaluate whether LLMs truthfully generate parameter distributions, and propose a model selection strategy for in-context learning and prior elicitation.
Our findings show that LLM-elicited prior parameter distributions significantly reduce predictive error compared to uninformative priors in low-data settings.
arXiv Detail & Related papers (2024-11-26T10:13:39Z) - ROBBIE: Robust Bias Evaluation of Large Generative Language Models [27.864027322486375]
Different prompt-based datasets can be used to measure social bias across multiple text domains and demographic axes.
We compare 6 different prompt-based bias and toxicity metrics across 12 demographic axes and 5 families of generative LLMs.
We conduct a comprehensive study of how well 3 bias/toxicity mitigation techniques perform across our suite of measurements.
arXiv Detail & Related papers (2023-11-29T23:03:04Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - Metrics for Dataset Demographic Bias: A Case Study on Facial Expression Recognition [4.336779198334903]
One of the most prominent types of demographic bias are statistical imbalances in the representation of demographic groups in the datasets.
We develop a taxonomy for the classification of these metrics, providing a practical guide for the selection of appropriate metrics.
The paper provides valuable insights for researchers in AI and related fields to mitigate dataset bias and improve the fairness and accuracy of AI models.
arXiv Detail & Related papers (2023-03-28T11:04:18Z) - Deep Learning on a Healthy Data Diet: Finding Important Examples for
Fairness [15.210232622716129]
Data-driven predictive solutions predominant in commercial applications tend to suffer from biases and stereotypes.
Data augmentation reduces gender bias by adding counterfactual examples to the training dataset.
We show that some of the examples in the augmented dataset can be not important or even harmful for fairness.
arXiv Detail & Related papers (2022-11-20T22:42:30Z) - Automatically Identifying Semantic Bias in Crowdsourced Natural Language
Inference Datasets [78.6856732729301]
We introduce a model-driven, unsupervised technique to find "bias clusters" in a learned embedding space of hypotheses in NLI datasets.
interventions and additional rounds of labeling can be performed to ameliorate the semantic bias of the hypothesis distribution of a dataset.
arXiv Detail & Related papers (2021-12-16T22:49:01Z) - Statistical discrimination in learning agents [64.78141757063142]
Statistical discrimination emerges in agent policies as a function of both the bias in the training population and of agent architecture.
We show that less discrimination emerges with agents that use recurrent neural networks, and when their training environment has less bias.
arXiv Detail & Related papers (2021-10-21T18:28:57Z) - Impact of Gender Debiased Word Embeddings in Language Modeling [0.0]
Gender, race and social biases have been detected as evident examples of unfairness in applications of Natural Language Processing.
Recent studies have shown that the human-generated data used in training is an apparent factor of getting biases.
Current algorithms have also been proven to amplify biases from data.
arXiv Detail & Related papers (2021-05-03T14:45:10Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce
Discrimination [53.3082498402884]
A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair.
We present a framework of fair semi-supervised learning in the pre-processing phase, including pseudo labeling to predict labels for unlabeled data.
A theoretical decomposition analysis of bias, variance and noise highlights the different sources of discrimination and the impact they have on fairness in semi-supervised learning.
arXiv Detail & Related papers (2020-09-25T05:48:56Z) - A survey of bias in Machine Learning through the prism of Statistical
Parity for the Adult Data Set [5.277804553312449]
We show the importance of understanding how a bias can be introduced into automatic decisions.
We first present a mathematical framework for the fair learning problem, specifically in the binary classification setting.
We then propose to quantify the presence of bias by using the standard Disparate Impact index on the real and well-known Adult income data set.
arXiv Detail & Related papers (2020-03-31T14:48:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.