A survey on bias in machine learning research
- URL: http://arxiv.org/abs/2308.11254v1
- Date: Tue, 22 Aug 2023 07:56:57 GMT
- Title: A survey on bias in machine learning research
- Authors: Agnieszka Miko{\l}ajczyk-Bare{\l}a, Micha{\l} Grochowski
- Abstract summary: Current research on bias in machine learning often focuses on fairness, while overlooking the roots or causes of bias.
This article aims to bridge the gap between past literature on bias in research by providing taxonomy for potential sources of bias and errors in data and models.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current research on bias in machine learning often focuses on fairness, while
overlooking the roots or causes of bias. However, bias was originally defined
as a "systematic error," often caused by humans at different stages of the
research process. This article aims to bridge the gap between past literature
on bias in research by providing taxonomy for potential sources of bias and
errors in data and models. The paper focus on bias in machine learning
pipelines. Survey analyses over forty potential sources of bias in the machine
learning (ML) pipeline, providing clear examples for each. By understanding the
sources and consequences of bias in machine learning, better methods can be
developed for its detecting and mitigating, leading to fairer, more
transparent, and more accurate ML models.
Related papers
- REFINE-LM: Mitigating Language Model Stereotypes via Reinforcement Learning [18.064064773660174]
We introduce REFINE-LM, a debiasing method that uses reinforcement learning to handle different types of biases without any fine-tuning.
By training a simple model on top of the word probability distribution of a LM, our bias reinforcement learning method enables model debiasing without human annotations.
Experiments conducted on a wide range of models, including several LMs, show that our method significantly reduces stereotypical biases while preserving LMs performance.
arXiv Detail & Related papers (2024-08-18T14:08:31Z) - Current Topological and Machine Learning Applications for Bias Detection
in Text [4.799066966918178]
This study utilizes the RedditBias database to analyze textual biases.
Four transformer models, including BERT and RoBERTa variants, were explored.
Findings suggest BERT, particularly mini BERT, excels in bias classification, while multilingual models lag.
arXiv Detail & Related papers (2023-11-22T16:12:42Z) - Dissecting Causal Biases [0.0]
This paper focuses on a class of bias originating in the way training data is generated and/or collected.
Four sources of bias are considered, namely, confounding, selection, measurement, and interaction.
arXiv Detail & Related papers (2023-10-20T09:12:10Z) - Fast Model Debias with Machine Unlearning [54.32026474971696]
Deep neural networks might behave in a biased manner in many real-world scenarios.
Existing debiasing methods suffer from high costs in bias labeling or model re-training.
We propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases.
arXiv Detail & Related papers (2023-10-19T08:10:57Z) - Mitigating Bias for Question Answering Models by Tracking Bias Influence [84.66462028537475]
We propose BMBI, an approach to mitigate the bias of multiple-choice QA models.
Based on the intuition that a model would lean to be more biased if it learns from a biased example, we measure the bias level of a query instance.
We show that our method could be applied to multiple QA formulations across multiple bias categories.
arXiv Detail & Related papers (2023-10-13T00:49:09Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Pseudo Bias-Balanced Learning for Debiased Chest X-ray Classification [57.53567756716656]
We study the problem of developing debiased chest X-ray diagnosis models without knowing exactly the bias labels.
We propose a novel algorithm, pseudo bias-balanced learning, which first captures and predicts per-sample bias labels.
Our proposed method achieved consistent improvements over other state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-18T11:02:18Z) - Learning from others' mistakes: Avoiding dataset biases without modeling
them [111.17078939377313]
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task.
Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available.
We show a method for training models that learn to ignore these problematic correlations.
arXiv Detail & Related papers (2020-12-02T16:10:54Z) - Learning from Failure: Training Debiased Classifier from Biased
Classifier [76.52804102765931]
We show that neural networks learn to rely on spurious correlation only when it is "easier" to learn than the desired knowledge.
We propose a failure-based debiasing scheme by training a pair of neural networks simultaneously.
Our method significantly improves the training of the network against various types of biases in both synthetic and real-world datasets.
arXiv Detail & Related papers (2020-07-06T07:20:29Z) - Underestimation Bias and Underfitting in Machine Learning [2.639737913330821]
What is termed algorithmic bias in machine learning will be due to historic bias in the training data.
Sometimes the bias may be introduced (or at least exacerbated) by the algorithm itself.
In this paper we report on initial research to understand the factors that contribute to bias in classification algorithms.
arXiv Detail & Related papers (2020-05-18T20:01:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.