Towards measuring fairness in AI: the Casual Conversations dataset
- URL: http://arxiv.org/abs/2104.02821v1
- Date: Tue, 6 Apr 2021 22:48:22 GMT
- Title: Towards measuring fairness in AI: the Casual Conversations dataset
- Authors: Caner Hazirbas, Joanna Bitton, Brian Dolhansky, Jacqueline Pan, Albert
Gordo, Cristian Canton Ferrer
- Abstract summary: Our dataset is composed of 3,011 subjects and contains over 45,000 videos, with an average of 15 videos per person.
The videos were recorded in multiple U.S. states with a diverse set of adults in various age, gender and apparent skin tone groups.
- Score: 9.246092246471955
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces a novel dataset to help researchers evaluate their
computer vision and audio models for accuracy across a diverse set of age,
genders, apparent skin tones and ambient lighting conditions. Our dataset is
composed of 3,011 subjects and contains over 45,000 videos, with an average of
15 videos per person. The videos were recorded in multiple U.S. states with a
diverse set of adults in various age, gender and apparent skin tone groups. A
key feature is that each subject agreed to participate for their likenesses to
be used. Additionally, our age and gender annotations are provided by the
subjects themselves. A group of trained annotators labeled the subjects'
apparent skin tone using the Fitzpatrick skin type scale. Moreover, annotations
for videos recorded in low ambient lighting are also provided. As an
application to measure robustness of predictions across certain attributes, we
provide a comprehensive study on the top five winners of the DeepFake Detection
Challenge (DFDC). Experimental evaluation shows that the winning models are
less performant on some specific groups of people, such as subjects with darker
skin tones and thus may not generalize to all people. In addition, we also
evaluate the state-of-the-art apparent age and gender classification methods.
Our experiments provides a through analysis on these models in terms of fair
treatment of people from various backgrounds.
Related papers
- Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs)
By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases.
The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z) - Vision-Language Models under Cultural and Inclusive Considerations [53.614528867159706]
Large vision-language models (VLMs) can assist visually impaired people by describing images from their daily lives.
Current evaluation datasets may not reflect diverse cultural user backgrounds or the situational context of this use case.
We create a survey to determine caption preferences and propose a culture-centric evaluation benchmark by filtering VizWiz, an existing dataset with images taken by people who are blind.
We then evaluate several VLMs, investigating their reliability as visual assistants in a culturally diverse setting.
arXiv Detail & Related papers (2024-07-08T17:50:00Z) - Unmasking Illusions: Understanding Human Perception of Audiovisual Deepfakes [49.81915942821647]
This paper aims to evaluate the human ability to discern deepfake videos through a subjective study.
We present our findings by comparing human observers to five state-ofthe-art audiovisual deepfake detection models.
We found that all AI models performed better than humans when evaluated on the same 40 videos.
arXiv Detail & Related papers (2024-05-07T07:57:15Z) - Vital Videos: A dataset of face videos with PPG and blood pressure
ground truths [0.0]
The dataset includes roughly equal numbers of males and females, as well as participants of all ages.
The data was collected in a diverse set of locations to ensure a wide variety of backgrounds and lighting conditions.
In an effort to assist in the research and development of remote vital sign measurement we are now opening up access to this dataset.
arXiv Detail & Related papers (2023-06-02T17:47:29Z) - Consensus and Subjectivity of Skin Tone Annotation for ML Fairness [1.0728297108232812]
We release the Monk Skin Tone Examples (MST-E) dataset, containing 1515 images and 31 videos spread across the full MST scale.
Our study shows that annotators can reliably annotate skin tone in a way that aligns with an expert in the MST scale, even under challenging environmental conditions.
We advise practitioners to use a diverse set of annotators and a higher replication count for each image when annotating skin tone for fairness research.
arXiv Detail & Related papers (2023-05-16T00:03:09Z) - The Casual Conversations v2 Dataset [6.439761523935614]
The dataset includes 26,467 videos of 5,567 unique paid participants, with an average of almost 5 videos per person.
The participants agreed for their data to be used in assessing fairness of AI models and provided self-reported age, gender, language/dialect, disability status, physical adornments, physical attributes and geo-location information.
arXiv Detail & Related papers (2023-03-08T19:17:05Z) - How Would The Viewer Feel? Estimating Wellbeing From Video Scenarios [73.24092762346095]
We introduce two large-scale datasets with over 60,000 videos annotated for emotional response and subjective wellbeing.
The Video Cognitive Empathy dataset contains annotations for distributions of fine-grained emotional responses, allowing models to gain a detailed understanding of affective states.
The Video to Valence dataset contains annotations of relative pleasantness between videos, which enables predicting a continuous spectrum of wellbeing.
arXiv Detail & Related papers (2022-10-18T17:58:25Z) - APES: Audiovisual Person Search in Untrimmed Video [87.4124877066541]
We present the Audiovisual Person Search dataset (APES)
APES contains over 1.9K identities labeled along 36 hours of video.
A key property of APES is that it includes dense temporal annotations that link faces to speech segments of the same identity.
arXiv Detail & Related papers (2021-06-03T08:16:42Z) - Grading video interviews with fairness considerations [1.7403133838762446]
We present a methodology to automatically derive social skills of candidates based on their video response to interview questions.
We develop two machine-learning models to predict social skills.
We analyze fairness by studying the errors of models by race and gender.
arXiv Detail & Related papers (2020-07-02T10:06:13Z) - Investigating Bias in Deep Face Analysis: The KANFace Dataset and
Empirical Study [67.3961439193994]
We introduce the most comprehensive, large-scale dataset of facial images and videos to date.
The data are manually annotated in terms of identity, exact age, gender and kinship.
A method to debias network embeddings is introduced and tested on the proposed benchmarks.
arXiv Detail & Related papers (2020-05-15T00:14:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.