Data Bias Management
- URL: http://arxiv.org/abs/2305.09686v1
- Date: Mon, 15 May 2023 10:07:27 GMT
- Title: Data Bias Management
- Authors: Gianluca Demartini and Kevin Roitero and Stefano Mizzaro
- Abstract summary: We show how bias in data affects end users, where bias is originated, and provide a viewpoint about what we should do about it.
We argue that data bias is not something that should necessarily be removed in all cases, and that research attention should instead shift from bias removal to bias management.
- Score: 17.067962372238135
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Due to the widespread use of data-powered systems in our everyday lives,
concepts like bias and fairness gained significant attention among researchers
and practitioners, in both industry and academia. Such issues typically emerge
from the data, which comes with varying levels of quality, used to train
supervised machine learning systems. With the commercialization and deployment
of such systems that are sometimes delegated to make life-changing decisions,
significant efforts are being made towards the identification and removal of
possible sources of data bias that may resurface to the final end user or in
the decisions being made. In this paper, we present research results that show
how bias in data affects end users, where bias is originated, and provide a
viewpoint about what we should do about it. We argue that data bias is not
something that should necessarily be removed in all cases, and that research
attention should instead shift from bias removal towards the identification,
measurement, indexing, surfacing, and adapting for bias, which we name bias
management.
Related papers
- Dissecting Causal Biases [0.0]
This paper focuses on a class of bias originating in the way training data is generated and/or collected.
Four sources of bias are considered, namely, confounding, selection, measurement, and interaction.
arXiv Detail & Related papers (2023-10-20T09:12:10Z) - Fast Model Debias with Machine Unlearning [54.32026474971696]
Deep neural networks might behave in a biased manner in many real-world scenarios.
Existing debiasing methods suffer from high costs in bias labeling or model re-training.
We propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases.
arXiv Detail & Related papers (2023-10-19T08:10:57Z) - Targeted Data Augmentation for bias mitigation [0.0]
We introduce a novel and efficient approach for addressing biases called Targeted Data Augmentation (TDA)
Unlike the laborious task of removing biases, our method proposes to insert biases instead, resulting in improved performance.
To identify biases, we annotated two diverse datasets: a dataset of clinical skin lesions and a dataset of male and female faces.
arXiv Detail & Related papers (2023-08-22T12:25:49Z) - A survey on bias in machine learning research [0.0]
Current research on bias in machine learning often focuses on fairness, while overlooking the roots or causes of bias.
This article aims to bridge the gap between past literature on bias in research by providing taxonomy for potential sources of bias and errors in data and models.
arXiv Detail & Related papers (2023-08-22T07:56:57Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - Representation Bias in Data: A Survey on Identification and Resolution
Techniques [26.142021257838564]
Data-driven algorithms are only as good as the data they work with, while data sets, especially social data, often fail to represent minorities adequately.
Representation Bias in data can happen due to various reasons ranging from historical discrimination to selection and sampling biases in the data acquisition and preparation methods.
This paper reviews the literature on identifying and resolving representation bias as a feature of a data set, independent of how consumed later.
arXiv Detail & Related papers (2022-03-22T16:30:22Z) - Pseudo Bias-Balanced Learning for Debiased Chest X-ray Classification [57.53567756716656]
We study the problem of developing debiased chest X-ray diagnosis models without knowing exactly the bias labels.
We propose a novel algorithm, pseudo bias-balanced learning, which first captures and predicts per-sample bias labels.
Our proposed method achieved consistent improvements over other state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-18T11:02:18Z) - Statistical discrimination in learning agents [64.78141757063142]
Statistical discrimination emerges in agent policies as a function of both the bias in the training population and of agent architecture.
We show that less discrimination emerges with agents that use recurrent neural networks, and when their training environment has less bias.
arXiv Detail & Related papers (2021-10-21T18:28:57Z) - Balancing out Bias: Achieving Fairness Through Training Reweighting [58.201275105195485]
Bias in natural language processing arises from models learning characteristics of the author such as gender and race.
Existing methods for mitigating and measuring bias do not directly account for correlations between author demographics and linguistic variables.
This paper introduces a very simple but highly effective method for countering bias using instance reweighting.
arXiv Detail & Related papers (2021-09-16T23:40:28Z) - Towards Measuring Bias in Image Classification [61.802949761385]
Convolutional Neural Networks (CNN) have become state-of-the-art for the main computer vision tasks.
However, due to the complex structure their decisions are hard to understand which limits their use in some context of the industrial world.
We present a systematic approach to uncover data bias by means of attribution maps.
arXiv Detail & Related papers (2021-07-01T10:50:39Z) - A survey of bias in Machine Learning through the prism of Statistical
Parity for the Adult Data Set [5.277804553312449]
We show the importance of understanding how a bias can be introduced into automatic decisions.
We first present a mathematical framework for the fair learning problem, specifically in the binary classification setting.
We then propose to quantify the presence of bias by using the standard Disparate Impact index on the real and well-known Adult income data set.
arXiv Detail & Related papers (2020-03-31T14:48:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.