FACET: Fairness in Computer Vision Evaluation Benchmark
- URL: http://arxiv.org/abs/2309.00035v1
- Date: Thu, 31 Aug 2023 17:59:48 GMT
- Title: FACET: Fairness in Computer Vision Evaluation Benchmark
- Authors: Laura Gustafson, Chloe Rolland, Nikhila Ravi, Quentin Duval, Aaron
Adcock, Cheng-Yang Fu, Melissa Hall, Candace Ross
- Abstract summary: Computer vision models have known performance disparities across attributes such as gender and skin tone.
We present a new benchmark named FACET (FAirness in Computer Vision EvaluaTion)
FACET is a large, publicly available evaluation set of 32k images for some of the most common vision tasks.
- Score: 21.862644380063756
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Computer vision models have known performance disparities across attributes
such as gender and skin tone. This means during tasks such as classification
and detection, model performance differs for certain classes based on the
demographics of the people in the image. These disparities have been shown to
exist, but until now there has not been a unified approach to measure these
differences for common use-cases of computer vision models. We present a new
benchmark named FACET (FAirness in Computer Vision EvaluaTion), a large,
publicly available evaluation set of 32k images for some of the most common
vision tasks - image classification, object detection and segmentation. For
every image in FACET, we hired expert reviewers to manually annotate
person-related attributes such as perceived skin tone and hair type, manually
draw bounding boxes and label fine-grained person-related classes such as disk
jockey or guitarist. In addition, we use FACET to benchmark state-of-the-art
vision models and present a deeper understanding of potential performance
disparities and challenges across sensitive demographic attributes. With the
exhaustive annotations collected, we probe models using single demographics
attributes as well as multiple attributes using an intersectional approach
(e.g. hair color and perceived skin tone). Our results show that
classification, detection, segmentation, and visual grounding models exhibit
performance disparities across demographic attributes and intersections of
attributes. These harms suggest that not all people represented in datasets
receive fair and equitable treatment in these vision tasks. We hope current and
future results using our benchmark will contribute to fairer, more robust
vision models. FACET is available publicly at https://facet.metademolab.com/
Related papers
- When Does Perceptual Alignment Benefit Vision Representations? [76.32336818860965]
We investigate how aligning vision model representations to human perceptual judgments impacts their usability.
We find that aligning models to perceptual judgments yields representations that improve upon the original backbones across many downstream tasks.
Our results suggest that injecting an inductive bias about human perceptual knowledge into vision models can contribute to better representations.
arXiv Detail & Related papers (2024-10-14T17:59:58Z) - Evaluating Multiview Object Consistency in Humans and Image Models [68.36073530804296]
We leverage an experimental design from the cognitive sciences which requires zero-shot visual inferences about object shape.
We collect 35K trials of behavioral data from over 500 participants.
We then evaluate the performance of common vision models.
arXiv Detail & Related papers (2024-09-09T17:59:13Z) - Leveraging vision-language models for fair facial attribute classification [19.93324644519412]
General-purpose vision-language model (VLM) is a rich knowledge source for common sensitive attributes.
We analyze the correspondence between VLM predicted and human defined sensitive attribute distribution.
Experiments on multiple benchmark facial attribute classification datasets show fairness gains of the model over existing unsupervised baselines.
arXiv Detail & Related papers (2024-03-15T18:37:15Z) - Leveraging Diffusion Perturbations for Measuring Fairness in Computer
Vision [25.414154497482162]
We demonstrate that diffusion models can be leveraged to create such a dataset.
We benchmark several vision-language models on a multi-class occupation classification task.
We find that images generated with non-Caucasian labels have a significantly higher occupation misclassification rate than images generated with Caucasian labels.
arXiv Detail & Related papers (2023-11-25T19:40:13Z) - Fairness meets Cross-Domain Learning: a new perspective on Models and
Metrics [80.07271410743806]
We study the relationship between cross-domain learning (CD) and model fairness.
We introduce a benchmark on face and medical images spanning several demographic groups as well as classification and localization tasks.
Our study covers 14 CD approaches alongside three state-of-the-art fairness algorithms and shows how the former can outperform the latter.
arXiv Detail & Related papers (2023-03-25T09:34:05Z) - DeAR: Debiasing Vision-Language Models with Additive Residuals [5.672132510411465]
Large pre-trained vision-language models (VLMs) provide rich, adaptable image and text representations.
These models suffer from societal biases owing to the skewed distribution of various identity groups in the training data.
We present DeAR, a novel debiasing method that learns additive residual image representations to offset the original representations.
arXiv Detail & Related papers (2023-03-18T14:57:43Z) - Auditing Gender Presentation Differences in Text-to-Image Models [54.16959473093973]
We study how gender is presented differently in text-to-image models.
By probing gender indicators in the input text, we quantify the frequency differences of presentation-centric attributes.
We propose an automatic method to estimate such differences.
arXiv Detail & Related papers (2023-02-07T18:52:22Z) - Ambiguous Images With Human Judgments for Robust Visual Event
Classification [34.62731821199598]
We create datasets of ambiguous images and use them to produce SQUID-E ("Squidy"), a collection of noisy images extracted from videos.
All images are annotated with ground truth values and a test set is annotated with human uncertainty judgments.
We use this dataset to characterize human uncertainty in vision tasks and evaluate existing visual event classification models.
arXiv Detail & Related papers (2022-10-06T17:52:20Z) - Explaining Bias in Deep Face Recognition via Image Characteristics [9.569575076277523]
We evaluate ten state-of-the-art face recognition models, comparing their fairness in terms of security and usability on two data sets.
We then analyze the impact of image characteristics on models performance.
arXiv Detail & Related papers (2022-08-23T17:18:23Z) - DALL-Eval: Probing the Reasoning Skills and Social Biases of
Text-to-Image Generation Models [73.12069620086311]
We investigate the visual reasoning capabilities and social biases of text-to-image models.
First, we measure three visual reasoning skills: object recognition, object counting, and spatial relation understanding.
Second, we assess the gender and skin tone biases by measuring the gender/skin tone distribution of generated images.
arXiv Detail & Related papers (2022-02-08T18:36:52Z) - Are Commercial Face Detection Models as Biased as Academic Models? [64.71318433419636]
We compare academic and commercial face detection systems, specifically examining robustness to noise.
We find that state-of-the-art academic face detection models exhibit demographic disparities in their noise robustness.
We conclude that commercial models are always as biased or more biased than an academic model.
arXiv Detail & Related papers (2022-01-25T02:21:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.