DSAP: Analyzing Bias Through Demographic Comparison of Datasets
- URL: http://arxiv.org/abs/2312.14626v1
- Date: Fri, 22 Dec 2023 11:51:20 GMT
- Title: DSAP: Analyzing Bias Through Demographic Comparison of Datasets
- Authors: Iris Dominguez-Catena, Daniel Paternain, Mikel Galar
- Abstract summary: We propose DSAP (Demographic Similarity from Auxiliary Profiles), a two-step methodology for comparing the demographic composition of two datasets.
DSAP can be deployed in three key applications: to detect and characterize demographic blind spots and bias issues across datasets, to measure dataset demographic bias in single datasets, and to measure dataset demographic shift in deployment scenarios.
An essential feature of DSAP is its ability to robustly analyze datasets without explicit demographic labels, offering simplicity and interpretability for a wide range of situations.
- Score: 4.8741052091630985
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the last few years, Artificial Intelligence systems have become
increasingly widespread. Unfortunately, these systems can share many biases
with human decision-making, including demographic biases. Often, these biases
can be traced back to the data used for training, where large uncurated
datasets have become the norm. Despite our knowledge of these biases, we still
lack general tools to detect and quantify them, as well as to compare the
biases in different datasets. Thus, in this work, we propose DSAP (Demographic
Similarity from Auxiliary Profiles), a two-step methodology for comparing the
demographic composition of two datasets. DSAP can be deployed in three key
applications: to detect and characterize demographic blind spots and bias
issues across datasets, to measure dataset demographic bias in single datasets,
and to measure dataset demographic shift in deployment scenarios. An essential
feature of DSAP is its ability to robustly analyze datasets without explicit
demographic labels, offering simplicity and interpretability for a wide range
of situations. To show the usefulness of the proposed methodology, we consider
the Facial Expression Recognition task, where demographic bias has previously
been found. The three applications are studied over a set of twenty datasets
with varying properties. The code is available at
https://github.com/irisdominguez/DSAP.
Related papers
- Leveraging Prototypical Representations for Mitigating Social Bias without Demographic Information [50.29934517930506]
DAFair is a novel approach to address social bias in language models.
We leverage prototypical demographic texts and incorporate a regularization term during the fine-tuning process to mitigate bias.
arXiv Detail & Related papers (2024-03-14T15:58:36Z) - Metrics for Dataset Demographic Bias: A Case Study on Facial Expression Recognition [4.336779198334903]
One of the most prominent types of demographic bias are statistical imbalances in the representation of demographic groups in the datasets.
We develop a taxonomy for the classification of these metrics, providing a practical guide for the selection of appropriate metrics.
The paper provides valuable insights for researchers in AI and related fields to mitigate dataset bias and improve the fairness and accuracy of AI models.
arXiv Detail & Related papers (2023-03-28T11:04:18Z) - Stable Bias: Analyzing Societal Representations in Diffusion Models [72.27121528451528]
We propose a new method for exploring the social biases in Text-to-Image (TTI) systems.
Our approach relies on characterizing the variation in generated images triggered by enumerating gender and ethnicity markers in the prompts.
We leverage this method to analyze images generated by 3 popular TTI systems and find that while all of their outputs show correlations with US labor demographics, they also consistently under-represent marginalized identities to different extents.
arXiv Detail & Related papers (2023-03-20T19:32:49Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - Assessing Demographic Bias Transfer from Dataset to Model: A Case Study
in Facial Expression Recognition [1.5340540198612824]
Two metrics focus on the representational and stereotypical bias of the dataset, and the third one on the residual bias of the trained model.
We demonstrate the usefulness of the metrics by applying them to a FER problem based on the popular Affectnet dataset.
arXiv Detail & Related papers (2022-05-20T09:40:42Z) - Representation Bias in Data: A Survey on Identification and Resolution
Techniques [26.142021257838564]
Data-driven algorithms are only as good as the data they work with, while data sets, especially social data, often fail to represent minorities adequately.
Representation Bias in data can happen due to various reasons ranging from historical discrimination to selection and sampling biases in the data acquisition and preparation methods.
This paper reviews the literature on identifying and resolving representation bias as a feature of a data set, independent of how consumed later.
arXiv Detail & Related papers (2022-03-22T16:30:22Z) - Certifying Robustness to Programmable Data Bias in Decision Trees [12.060443368097102]
We certify that models produced by a learning learner are pointwise-robust to potential dataset biases.
Our approach allows specifying bias models across a variety of dimensions.
We evaluate our approach on datasets commonly used in the fairness literature.
arXiv Detail & Related papers (2021-10-08T20:15:17Z) - Balancing Biases and Preserving Privacy on Balanced Faces in the Wild [50.915684171879036]
There are demographic biases present in current facial recognition (FR) models.
We introduce our Balanced Faces in the Wild dataset to measure these biases across different ethnic and gender subgroups.
We find that relying on a single score threshold to differentiate between genuine and imposters sample pairs leads to suboptimal results.
We propose a novel domain adaptation learning scheme that uses facial features extracted from state-of-the-art neural networks.
arXiv Detail & Related papers (2021-03-16T15:05:49Z) - Enhancing Facial Data Diversity with Style-based Face Aging [59.984134070735934]
In particular, face datasets are typically biased in terms of attributes such as gender, age, and race.
We propose a novel, generative style-based architecture for data augmentation that captures fine-grained aging patterns.
We show that the proposed method outperforms state-of-the-art algorithms for age transfer.
arXiv Detail & Related papers (2020-06-06T21:53:44Z) - REVISE: A Tool for Measuring and Mitigating Bias in Visual Datasets [64.76453161039973]
REVISE (REvealing VIsual biaSEs) is a tool that assists in the investigation of a visual dataset.
It surfacing potential biases along three dimensions: (1) object-based, (2) person-based, and (3) geography-based.
arXiv Detail & Related papers (2020-04-16T23:54:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.