A Survey on Bias in Visual Datasets
- URL: http://arxiv.org/abs/2107.07919v1
- Date: Fri, 16 Jul 2021 14:16:52 GMT
- Title: A Survey on Bias in Visual Datasets
- Authors: Simone Fabbrizzi, Symeon Papadopoulos, Eirini Ntoutsi, Ioannis
Kompatsiaris
- Abstract summary: Computer Vision (CV) has achieved remarkable results, outperforming humans in several tasks.
CV systems highly depend on the data they are fed with and can learn and amplify biases within such data.
Yet, to date there is no comprehensive survey on bias in visual datasets.
- Score: 17.79365832663837
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Computer Vision (CV) has achieved remarkable results, outperforming humans in
several tasks. Nonetheless, it may result in major discrimination if not dealt
with proper care. CV systems highly depend on the data they are fed with and
can learn and amplify biases within such data. Thus, both the problems of
understanding and discovering biases are of utmost importance. Yet, to date
there is no comprehensive survey on bias in visual datasets. To this end, this
work aims to: i) describe the biases that can affect visual datasets; ii)
review the literature on methods for bias discovery and quantification in
visual datasets; iii) discuss existing attempts to collect bias-aware visual
datasets. A key conclusion of our study is that the problem of bias discovery
and quantification in visual datasets is still open and there is room for
improvement in terms of both methods and the range of biases that can be
addressed; moreover, there is no such thing as a bias-free dataset, so
scientists and practitioners must become aware of the biases in their datasets
and make them explicit. To this end, we propose a checklist that can be used to
spot different types of bias during visual dataset collection.
Related papers
- Is There a One-Model-Fits-All Approach to Information Extraction? Revisiting Task Definition Biases [62.806300074459116]
Definition bias is a negative phenomenon that can mislead models.
We identify two types of definition bias in IE: bias among information extraction datasets and bias between information extraction datasets and instruction tuning datasets.
We propose a multi-stage framework consisting of definition bias measurement, bias-aware fine-tuning, and task-specific bias mitigation.
arXiv Detail & Related papers (2024-03-25T03:19:20Z) - Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and
Beyond [93.96982273042296]
Vision-language (VL) understanding tasks evaluate models' comprehension of complex visual scenes through multiple-choice questions.
We have identified two dataset biases that models can exploit as shortcuts to resolve various VL tasks correctly without proper understanding.
We propose Adversarial Data Synthesis (ADS) to generate synthetic training and debiased evaluation data.
We then introduce Intra-sample Counterfactual Training (ICT) to assist models in utilizing the synthesized training data, particularly the counterfactual data, via focusing on intra-sample differentiation.
arXiv Detail & Related papers (2023-10-23T08:09:42Z) - Targeted Data Augmentation for bias mitigation [0.0]
We introduce a novel and efficient approach for addressing biases called Targeted Data Augmentation (TDA)
Unlike the laborious task of removing biases, our method proposes to insert biases instead, resulting in improved performance.
To identify biases, we annotated two diverse datasets: a dataset of clinical skin lesions and a dataset of male and female faces.
arXiv Detail & Related papers (2023-08-22T12:25:49Z) - Approximating Counterfactual Bounds while Fusing Observational, Biased
and Randomised Data Sources [64.96984404868411]
We address the problem of integrating data from multiple, possibly biased, observational and interventional studies.
We show that the likelihood of the available data has no local maxima.
We then show how the same approach can address the general case of multiple datasets.
arXiv Detail & Related papers (2023-07-31T11:28:24Z) - Data Bias Management [17.067962372238135]
We show how bias in data affects end users, where bias is originated, and provide a viewpoint about what we should do about it.
We argue that data bias is not something that should necessarily be removed in all cases, and that research attention should instead shift from bias removal to bias management.
arXiv Detail & Related papers (2023-05-15T10:07:27Z) - Mitigating Representation Bias in Action Recognition: Algorithms and
Benchmarks [76.35271072704384]
Deep learning models perform poorly when applied to videos with rare scenes or objects.
We tackle this problem from two different angles: algorithm and dataset.
We show that the debiased representation can generalize better when transferred to other datasets and tasks.
arXiv Detail & Related papers (2022-09-20T00:30:35Z) - Intrinsic Bias Identification on Medical Image Datasets [9.054785751150547]
We first define the data intrinsic bias attribute, and then propose a novel bias identification framework for medical image datasets.
The framework contains two major components, KlotskiNet and Bias Discriminant Direction Analysis(bdda), where KlostkiNet is to build the mapping which makes backgrounds to distinguish positive and negative samples.
Experimental results on three datasets show the effectiveness of the bias attributes discovered by the framework.
arXiv Detail & Related papers (2022-03-24T06:28:07Z) - Representation Bias in Data: A Survey on Identification and Resolution
Techniques [26.142021257838564]
Data-driven algorithms are only as good as the data they work with, while data sets, especially social data, often fail to represent minorities adequately.
Representation Bias in data can happen due to various reasons ranging from historical discrimination to selection and sampling biases in the data acquisition and preparation methods.
This paper reviews the literature on identifying and resolving representation bias as a feature of a data set, independent of how consumed later.
arXiv Detail & Related papers (2022-03-22T16:30:22Z) - Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles [66.15398165275926]
We propose a method that can automatically detect and ignore dataset-specific patterns, which we call dataset biases.
Our method trains a lower capacity model in an ensemble with a higher capacity model.
We show improvement in all settings, including a 10 point gain on the visual question answering dataset.
arXiv Detail & Related papers (2020-11-07T22:20:03Z) - REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets [64.76453161039973]
REVISE (REvealing VIsual biaSEs) is a tool that assists in the investigation of a visual dataset.
It surfacing potential biases along three dimensions: (1) object-based, (2) person-based, and (3) geography-based.
arXiv Detail & Related papers (2020-04-16T23:54:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.