Towards Intersectionality in Machine Learning: Including More
Identities, Handling Underrepresentation, and Performing Evaluation
- URL: http://arxiv.org/abs/2205.04610v1
- Date: Tue, 10 May 2022 01:00:52 GMT
- Title: Towards Intersectionality in Machine Learning: Including More
Identities, Handling Underrepresentation, and Performing Evaluation
- Authors: Angelina Wang and Vikram V. Ramaswamy and Olga Russakovsky
- Abstract summary: We grapple with questions that arise along three stages of the machine learning pipeline when incorporating intersectionality as multiple demographic attributes.
We advocate for supplementing domain knowledge with empirical validation when choosing which demographic attribute labels to train on.
We warn against using data imbalance techniques without considering their normative implications.
- Score: 23.661509482014058
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Research in machine learning fairness has historically considered a single
binary demographic attribute; however, the reality is of course far more
complicated. In this work, we grapple with questions that arise along three
stages of the machine learning pipeline when incorporating intersectionality as
multiple demographic attributes: (1) which demographic attributes to include as
dataset labels, (2) how to handle the progressively smaller size of subgroups
during model training, and (3) how to move beyond existing evaluation metrics
when benchmarking model fairness for more subgroups. For each question, we
provide thorough empirical evaluation on tabular datasets derived from the US
Census, and present constructive recommendations for the machine learning
community. First, we advocate for supplementing domain knowledge with empirical
validation when choosing which demographic attribute labels to train on, while
always evaluating on the full set of demographic attributes. Second, we warn
against using data imbalance techniques without considering their normative
implications and suggest an alternative using the structure in the data. Third,
we introduce new evaluation metrics which are more appropriate for the
intersectional setting. Overall, we provide substantive suggestions on three
necessary (albeit not sufficient!) considerations when incorporating
intersectionality into machine learning.
Related papers
- Bridging the Gap: Protocol Towards Fair and Consistent Affect Analysis [24.737468736951374]
The increasing integration of machine learning algorithms in daily life underscores the critical need for fairness and equity in their deployment.
Existing databases and methodologies lack uniformity, leading to biased evaluations.
This work addresses these issues by analyzing six affective databases, annotating demographic attributes, and proposing a common protocol for database partitioning.
arXiv Detail & Related papers (2024-05-10T22:40:01Z) - TIDE: Textual Identity Detection for Evaluating and Augmenting
Classification and Language Models [0.0]
Machine learning models can perpetuate unintended biases from unfair and imbalanced datasets.
We present a dataset coupled with an approach to improve text fairness in classifiers and language models.
We leverage TIDAL to develop an identity annotation and augmentation tool that can be used to improve the availability of identity context.
arXiv Detail & Related papers (2023-09-07T21:44:42Z) - Bias and Fairness in Large Language Models: A Survey [73.87651986156006]
We present a comprehensive survey of bias evaluation and mitigation techniques for large language models (LLMs)
We first consolidate, formalize, and expand notions of social bias and fairness in natural language processing.
We then unify the literature by proposing three intuitive, two for bias evaluation, and one for mitigation.
arXiv Detail & Related papers (2023-09-02T00:32:55Z) - Fairness meets Cross-Domain Learning: a new perspective on Models and
Metrics [80.07271410743806]
We study the relationship between cross-domain learning (CD) and model fairness.
We introduce a benchmark on face and medical images spanning several demographic groups as well as classification and localization tasks.
Our study covers 14 CD approaches alongside three state-of-the-art fairness algorithms and shows how the former can outperform the latter.
arXiv Detail & Related papers (2023-03-25T09:34:05Z) - A Survey of Learning on Small Data: Generalization, Optimization, and
Challenge [101.27154181792567]
Learning on small data that approximates the generalization ability of big data is one of the ultimate purposes of AI.
This survey follows the active sampling theory under a PAC framework to analyze the generalization error and label complexity of learning on small data.
Multiple data applications that may benefit from efficient small data representation are surveyed.
arXiv Detail & Related papers (2022-07-29T02:34:19Z) - Assessing Demographic Bias Transfer from Dataset to Model: A Case Study
in Facial Expression Recognition [1.5340540198612824]
Two metrics focus on the representational and stereotypical bias of the dataset, and the third one on the residual bias of the trained model.
We demonstrate the usefulness of the metrics by applying them to a FER problem based on the popular Affectnet dataset.
arXiv Detail & Related papers (2022-05-20T09:40:42Z) - A survey on datasets for fairness-aware machine learning [6.962333053044713]
A large variety of fairness-aware machine learning solutions have been proposed.
In this paper, we overview real-world datasets used for fairness-aware machine learning.
For a deeper understanding of bias and fairness in the datasets, we investigate the interesting relationships using exploratory analysis.
arXiv Detail & Related papers (2021-10-01T16:54:04Z) - MultiFair: Multi-Group Fairness in Machine Learning [52.24956510371455]
We study multi-group fairness in machine learning (MultiFair)
We propose a generic end-to-end algorithmic framework to solve it.
Our proposed framework is generalizable to many different settings.
arXiv Detail & Related papers (2021-05-24T02:30:22Z) - Quantifying Learnability and Describability of Visual Concepts Emerging
in Representation Learning [91.58529629419135]
We consider how to characterise visual groupings discovered automatically by deep neural networks.
We introduce two concepts, visual learnability and describability, that can be used to quantify the interpretability of arbitrary image groupings.
arXiv Detail & Related papers (2020-10-27T18:41:49Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z) - A survey of bias in Machine Learning through the prism of Statistical
Parity for the Adult Data Set [5.277804553312449]
We show the importance of understanding how a bias can be introduced into automatic decisions.
We first present a mathematical framework for the fair learning problem, specifically in the binary classification setting.
We then propose to quantify the presence of bias by using the standard Disparate Impact index on the real and well-known Adult income data set.
arXiv Detail & Related papers (2020-03-31T14:48:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.