Measuring Model Biases in the Absence of Ground Truth
- URL: http://arxiv.org/abs/2103.03417v1
- Date: Fri, 5 Mar 2021 01:23:22 GMT
- Title: Measuring Model Biases in the Absence of Ground Truth
- Authors: Osman Aka, Ken Burke, Alex B\"auerle, Christina Greer, Margaret
Mitchell
- Abstract summary: We introduce a new framing to the measurement of fairness and bias that does not rely on ground truth labels.
Instead, we treat the model predictions for a given image as a set of labels, analogous to a 'bag of words' approach used in Natural Language Processing (NLP)
We demonstrate how the statistical properties (especially normalization) of the different association metrics can lead to different sets of labels detected as having "gender bias"
- Score: 2.802021236064919
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in computer vision have led to the development of image
classification models that can predict tens of thousands of object classes.
Training these models can require millions of examples, leading to a demand of
potentially billions of annotations. In practice, however, images are typically
sparsely annotated, which can lead to problematic biases in the distribution of
ground truth labels that are collected. This potential for annotation bias may
then limit the utility of ground truth-dependent fairness metrics (e.g.,
Equalized Odds). To address this problem, in this work we introduce a new
framing to the measurement of fairness and bias that does not rely on ground
truth labels. Instead, we treat the model predictions for a given image as a
set of labels, analogous to a 'bag of words' approach used in Natural Language
Processing (NLP). This allows us to explore different association metrics
between prediction sets in order to detect patterns of bias. We apply this
approach to examine the relationship between identity labels, and all other
labels in the dataset, using labels associated with 'male' and 'female') as a
concrete example. We demonstrate how the statistical properties (especially
normalization) of the different association metrics can lead to different sets
of labels detected as having "gender bias". We conclude by demonstrating that
pointwise mutual information normalized by joint probability (nPMI) is able to
detect many labels with significant gender bias despite differences in the
labels' marginal frequencies. Finally, we announce an open-sourced nPMI
visualization tool using TensorBoard.
Related papers
- Leveraging Ensemble Diversity for Robust Self-Training in the Presence of Sample Selection Bias [5.698050337128548]
Self-training is a well-known approach for semi-supervised learning. It consists of iteratively assigning pseudo-labels to unlabeled data for which the model is confident and treating them as labeled examples.
For neural networks, softmax prediction probabilities are often used as a confidence measure, although they are known to be overconfident, even for wrong predictions.
We propose a novel confidence measure, called $mathcalT$-similarity, built upon the prediction diversity of an ensemble of linear classifiers.
arXiv Detail & Related papers (2023-10-23T11:30:06Z) - Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic
Contrast Sets [52.77024349608834]
Vision-language models can perpetuate and amplify societal biases learned during pre-training on uncurated image-text pairs from the internet.
COCO Captions is the most commonly used dataset for evaluating bias between background context and the gender of people in-situ.
We propose a novel dataset debiasing pipeline to augment the COCO dataset with synthetic, gender-balanced contrast sets.
arXiv Detail & Related papers (2023-05-24T17:59:18Z) - Fairness and Bias in Truth Discovery Algorithms: An Experimental
Analysis [7.575734557466221]
Crowd workers may sometimes provide unreliable labels.
Truth discovery (TD) algorithms are applied to determine the consensus labels from conflicting worker responses.
We conduct a systematic study of the bias and fairness of TD algorithms.
arXiv Detail & Related papers (2023-04-25T04:56:35Z) - Dist-PU: Positive-Unlabeled Learning from a Label Distribution
Perspective [89.5370481649529]
We propose a label distribution perspective for PU learning in this paper.
Motivated by this, we propose to pursue the label distribution consistency between predicted and ground-truth label distributions.
Experiments on three benchmark datasets validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-06T07:38:29Z) - Instance-Dependent Partial Label Learning [69.49681837908511]
Partial label learning is a typical weakly supervised learning problem.
Most existing approaches assume that the incorrect labels in each training example are randomly picked as the candidate labels.
In this paper, we consider instance-dependent and assume that each example is associated with a latent label distribution constituted by the real number of each label.
arXiv Detail & Related papers (2021-10-25T12:50:26Z) - Unbiased Loss Functions for Multilabel Classification with Missing
Labels [2.1549398927094874]
Missing labels are a ubiquitous phenomenon in extreme multi-label classification (XMC) tasks.
This paper derives the unique unbiased estimators for the different multilabel reductions.
arXiv Detail & Related papers (2021-09-23T10:39:02Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z) - The Gap on GAP: Tackling the Problem of Differing Data Distributions in
Bias-Measuring Datasets [58.53269361115974]
Diagnostic datasets that can detect biased models are an important prerequisite for bias reduction within natural language processing.
undesired patterns in the collected data can make such tests incorrect.
We introduce a theoretically grounded method for weighting test samples to cope with such patterns in the test data.
arXiv Detail & Related papers (2020-11-03T16:50:13Z) - Debiased Contrastive Learning [64.98602526764599]
We develop a debiased contrastive objective that corrects for the sampling of same-label datapoints.
Empirically, the proposed objective consistently outperforms the state-of-the-art for representation learning in vision, language, and reinforcement learning benchmarks.
arXiv Detail & Related papers (2020-07-01T04:25:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.