A survey on datasets for fairness-aware machine learning
- URL: http://arxiv.org/abs/2110.00530v1
- Date: Fri, 1 Oct 2021 16:54:04 GMT
- Title: A survey on datasets for fairness-aware machine learning
- Authors: Tai Le Quy, Arjun Roy, Vasileios Iosifidis, Eirini Ntoutsi
- Abstract summary: A large variety of fairness-aware machine learning solutions have been proposed.
In this paper, we overview real-world datasets used for fairness-aware machine learning.
For a deeper understanding of bias and fairness in the datasets, we investigate the interesting relationships using exploratory analysis.
- Score: 6.962333053044713
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As decision-making increasingly relies on machine learning and (big) data,
the issue of fairness in data-driven AI systems is receiving increasing
attention from both research and industry. A large variety of fairness-aware
machine learning solutions have been proposed which propose fairness-related
interventions in the data, learning algorithms and/or model outputs. However, a
vital part of proposing new approaches is evaluating them empirically on
benchmark datasets that represent realistic and diverse settings. Therefore, in
this paper, we overview real-world datasets used for fairness-aware machine
learning. We focus on tabular data as the most common data representation for
fairness-aware machine learning. We start our analysis by identifying
relationships among the different attributes, particularly w.r.t. protected
attributes and class attributes, using a Bayesian network. For a deeper
understanding of bias and fairness in the datasets, we investigate the
interesting relationships using exploratory analysis.
Related papers
- Targeted Learning for Data Fairness [52.59573714151884]
We expand fairness inference by evaluating fairness in the data generating process itself.
We derive estimators demographic parity, equal opportunity, and conditional mutual information.
To validate our approach, we perform several simulations and apply our estimators to real data.
arXiv Detail & Related papers (2025-02-06T18:51:28Z) - Analyzing Fairness of Computer Vision and Natural Language Processing Models [1.0923877073891446]
Machine learning (ML) algorithms play a crucial role in decision making across diverse fields such as healthcare, finance, education, and law enforcement.
Despite their widespread adoption, these systems raise ethical and social concerns due to potential biases and fairness issues.
This study focuses on evaluating and improving the fairness of Computer Vision and Natural Language Processing (NLP) models applied to unstructured datasets.
arXiv Detail & Related papers (2024-12-13T06:35:55Z) - Analyzing Fairness of Classification Machine Learning Model with Structured Dataset [1.0923877073891446]
This study investigates the fairness of machine learning models applied to structured datasets in classification tasks.
Three fairness libraries; Fairlearn by Microsoft, AIF360 by IBM, and the What If Tool by Google were employed.
The research aims to assess the extent of bias in the ML models, compare the effectiveness of these libraries, and derive actionable insights for practitioners.
arXiv Detail & Related papers (2024-12-13T06:31:09Z) - Fair Mixed Effects Support Vector Machine [0.0]
Fairness in machine learning aims to mitigate biases present in the training data and model imperfections.
This is achieved by preventing the model from making decisions based on sensitive characteristics like ethnicity or sexual orientation.
We present a fair mixed effects support vector machine algorithm that can handle both problems simultaneously.
arXiv Detail & Related papers (2024-05-10T12:25:06Z) - Fairness meets Cross-Domain Learning: a new perspective on Models and
Metrics [80.07271410743806]
We study the relationship between cross-domain learning (CD) and model fairness.
We introduce a benchmark on face and medical images spanning several demographic groups as well as classification and localization tasks.
Our study covers 14 CD approaches alongside three state-of-the-art fairness algorithms and shows how the former can outperform the latter.
arXiv Detail & Related papers (2023-03-25T09:34:05Z) - Achieving Transparency in Distributed Machine Learning with Explainable
Data Collaboration [5.994347858883343]
A parallel trend has been to train machine learning models in collaboration with other data holders without accessing their data.
This paper presents an Explainable Data Collaboration Framework based on a model-agnostic additive feature attribution algorithm.
arXiv Detail & Related papers (2022-12-06T23:53:41Z) - MultiFair: Multi-Group Fairness in Machine Learning [52.24956510371455]
We study multi-group fairness in machine learning (MultiFair)
We propose a generic end-to-end algorithmic framework to solve it.
Our proposed framework is generalizable to many different settings.
arXiv Detail & Related papers (2021-05-24T02:30:22Z) - Through the Data Management Lens: Experimental Analysis and Evaluation
of Fair Classification [75.49600684537117]
Data management research is showing an increasing presence and interest in topics related to data and algorithmic fairness.
We contribute a broad analysis of 13 fair classification approaches and additional variants, over their correctness, fairness, efficiency, scalability, and stability.
Our analysis highlights novel insights on the impact of different metrics and high-level approach characteristics on different aspects of performance.
arXiv Detail & Related papers (2021-01-18T22:55:40Z) - Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce
Discrimination [53.3082498402884]
A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair.
We present a framework of fair semi-supervised learning in the pre-processing phase, including pseudo labeling to predict labels for unlabeled data.
A theoretical decomposition analysis of bias, variance and noise highlights the different sources of discrimination and the impact they have on fairness in semi-supervised learning.
arXiv Detail & Related papers (2020-09-25T05:48:56Z) - Bringing the People Back In: Contesting Benchmark Machine Learning
Datasets [11.00769651520502]
We outline a research program - a genealogy of machine learning data - for investigating how and why these datasets have been created.
We describe the ways in which benchmark datasets in machine learning operate as infrastructure and pose four research questions for these datasets.
arXiv Detail & Related papers (2020-07-14T23:22:13Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.