REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets
- URL: http://arxiv.org/abs/2004.07999v4
- Date: Fri, 23 Jul 2021 18:41:42 GMT
- Title: REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets
- Authors: Angelina Wang and Alexander Liu and Ryan Zhang and Anat Kleiman and
Leslie Kim and Dora Zhao and Iroha Shirai and Arvind Narayanan and Olga
Russakovsky
- Abstract summary: REVISE (REvealing VIsual biaSEs) is a tool that assists in the investigation of a visual dataset.
It surfacing potential biases along three dimensions: (1) object-based, (2) person-based, and (3) geography-based.
- Score: 64.76453161039973
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning models are known to perpetuate and even amplify the biases
present in the data. However, these data biases frequently do not become
apparent until after the models are deployed. Our work tackles this issue and
enables the preemptive analysis of large-scale datasets. REVISE (REvealing
VIsual biaSEs) is a tool that assists in the investigation of a visual dataset,
surfacing potential biases along three dimensions: (1) object-based, (2)
person-based, and (3) geography-based. Object-based biases relate to the size,
context, or diversity of the depicted objects. Person-based metrics focus on
analyzing the portrayal of people within the dataset. Geography-based analyses
consider the representation of different geographic locations. These three
dimensions are deeply intertwined in how they interact to bias a dataset, and
REVISE sheds light on this; the responsibility then lies with the user to
consider the cultural and historical context, and to determine which of the
revealed biases may be problematic. The tool further assists the user by
suggesting actionable steps that may be taken to mitigate the revealed biases.
Overall, the key aim of our work is to tackle the machine learning bias problem
early in the pipeline. REVISE is available at
https://github.com/princetonvisualai/revise-tool
Related papers
- DSAP: Analyzing Bias Through Demographic Comparison of Datasets [4.8741052091630985]
We propose DSAP (Demographic Similarity from Auxiliary Profiles), a two-step methodology for comparing the demographic composition of two datasets.
DSAP can be deployed in three key applications: to detect and characterize demographic blind spots and bias issues across datasets, to measure dataset demographic bias in single datasets, and to measure dataset demographic shift in deployment scenarios.
An essential feature of DSAP is its ability to robustly analyze datasets without explicit demographic labels, offering simplicity and interpretability for a wide range of situations.
arXiv Detail & Related papers (2023-12-22T11:51:20Z) - Dataset Bias Mitigation in Multiple-Choice Visual Question Answering and
Beyond [93.96982273042296]
Vision-language (VL) understanding tasks evaluate models' comprehension of complex visual scenes through multiple-choice questions.
We have identified two dataset biases that models can exploit as shortcuts to resolve various VL tasks correctly without proper understanding.
We propose Adversarial Data Synthesis (ADS) to generate synthetic training and debiased evaluation data.
We then introduce Intra-sample Counterfactual Training (ICT) to assist models in utilizing the synthesized training data, particularly the counterfactual data, via focusing on intra-sample differentiation.
arXiv Detail & Related papers (2023-10-23T08:09:42Z) - Mitigating Representation Bias in Action Recognition: Algorithms and
Benchmarks [76.35271072704384]
Deep learning models perform poorly when applied to videos with rare scenes or objects.
We tackle this problem from two different angles: algorithm and dataset.
We show that the debiased representation can generalize better when transferred to other datasets and tasks.
arXiv Detail & Related papers (2022-09-20T00:30:35Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - Assessing Demographic Bias Transfer from Dataset to Model: A Case Study
in Facial Expression Recognition [1.5340540198612824]
Two metrics focus on the representational and stereotypical bias of the dataset, and the third one on the residual bias of the trained model.
We demonstrate the usefulness of the metrics by applying them to a FER problem based on the popular Affectnet dataset.
arXiv Detail & Related papers (2022-05-20T09:40:42Z) - Representation Bias in Data: A Survey on Identification and Resolution
Techniques [26.142021257838564]
Data-driven algorithms are only as good as the data they work with, while data sets, especially social data, often fail to represent minorities adequately.
Representation Bias in data can happen due to various reasons ranging from historical discrimination to selection and sampling biases in the data acquisition and preparation methods.
This paper reviews the literature on identifying and resolving representation bias as a feature of a data set, independent of how consumed later.
arXiv Detail & Related papers (2022-03-22T16:30:22Z) - Certifying Robustness to Programmable Data Bias in Decision Trees [12.060443368097102]
We certify that models produced by a learning learner are pointwise-robust to potential dataset biases.
Our approach allows specifying bias models across a variety of dimensions.
We evaluate our approach on datasets commonly used in the fairness literature.
arXiv Detail & Related papers (2021-10-08T20:15:17Z) - A Survey on Bias in Visual Datasets [17.79365832663837]
Computer Vision (CV) has achieved remarkable results, outperforming humans in several tasks.
CV systems highly depend on the data they are fed with and can learn and amplify biases within such data.
Yet, to date there is no comprehensive survey on bias in visual datasets.
arXiv Detail & Related papers (2021-07-16T14:16:52Z) - Towards Measuring Bias in Image Classification [61.802949761385]
Convolutional Neural Networks (CNN) have become state-of-the-art for the main computer vision tasks.
However, due to the complex structure their decisions are hard to understand which limits their use in some context of the industrial world.
We present a systematic approach to uncover data bias by means of attribution maps.
arXiv Detail & Related papers (2021-07-01T10:50:39Z) - Competency Problems: On Finding and Removing Artifacts in Language Data [50.09608320112584]
We argue that for complex language understanding tasks, all simple feature correlations are spurious.
We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account.
arXiv Detail & Related papers (2021-04-17T21:34:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.