A survey on datasets for fairness-aware machine learning
- URL: http://arxiv.org/abs/2110.00530v1
- Date: Fri, 1 Oct 2021 16:54:04 GMT
- Title: A survey on datasets for fairness-aware machine learning
- Authors: Tai Le Quy, Arjun Roy, Vasileios Iosifidis, Eirini Ntoutsi
- Abstract summary: A large variety of fairness-aware machine learning solutions have been proposed.
In this paper, we overview real-world datasets used for fairness-aware machine learning.
For a deeper understanding of bias and fairness in the datasets, we investigate the interesting relationships using exploratory analysis.
- Score: 6.962333053044713
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As decision-making increasingly relies on machine learning and (big) data,
the issue of fairness in data-driven AI systems is receiving increasing
attention from both research and industry. A large variety of fairness-aware
machine learning solutions have been proposed which propose fairness-related
interventions in the data, learning algorithms and/or model outputs. However, a
vital part of proposing new approaches is evaluating them empirically on
benchmark datasets that represent realistic and diverse settings. Therefore, in
this paper, we overview real-world datasets used for fairness-aware machine
learning. We focus on tabular data as the most common data representation for
fairness-aware machine learning. We start our analysis by identifying
relationships among the different attributes, particularly w.r.t. protected
attributes and class attributes, using a Bayesian network. For a deeper
understanding of bias and fairness in the datasets, we investigate the
interesting relationships using exploratory analysis.
Related papers
- Fair Mixed Effects Support Vector Machine [0.0]
Fairness in machine learning aims to mitigate biases present in the training data and model imperfections.
This is achieved by preventing the model from making decisions based on sensitive characteristics like ethnicity or sexual orientation.
We present a fair mixed effects support vector machine algorithm that can handle both problems simultaneously.
arXiv Detail & Related papers (2024-05-10T12:25:06Z) - Fairness meets Cross-Domain Learning: a new perspective on Models and
Metrics [80.07271410743806]
We study the relationship between cross-domain learning (CD) and model fairness.
We introduce a benchmark on face and medical images spanning several demographic groups as well as classification and localization tasks.
Our study covers 14 CD approaches alongside three state-of-the-art fairness algorithms and shows how the former can outperform the latter.
arXiv Detail & Related papers (2023-03-25T09:34:05Z) - Achieving Transparency in Distributed Machine Learning with Explainable
Data Collaboration [5.994347858883343]
A parallel trend has been to train machine learning models in collaboration with other data holders without accessing their data.
This paper presents an Explainable Data Collaboration Framework based on a model-agnostic additive feature attribution algorithm.
arXiv Detail & Related papers (2022-12-06T23:53:41Z) - FAIR-FATE: Fair Federated Learning with Momentum [0.41998444721319217]
We propose a novel FAIR FederATEd Learning algorithm that aims to achieve group fairness while maintaining high utility.
To the best of our knowledge, this is the first approach in machine learning that aims to achieve fairness using a fair Momentum estimate.
Experimental results on real-world datasets demonstrate that FAIR-FATE outperforms state-of-the-art fair Federated Learning algorithms.
arXiv Detail & Related papers (2022-09-27T20:33:38Z) - D-BIAS: A Causality-Based Human-in-the-Loop System for Tackling
Algorithmic Bias [57.87117733071416]
We propose D-BIAS, a visual interactive tool that embodies human-in-the-loop AI approach for auditing and mitigating social biases.
A user can detect the presence of bias against a group by identifying unfair causal relationships in the causal network.
For each interaction, say weakening/deleting a biased causal edge, the system uses a novel method to simulate a new (debiased) dataset.
arXiv Detail & Related papers (2022-08-10T03:41:48Z) - MultiFair: Multi-Group Fairness in Machine Learning [52.24956510371455]
We study multi-group fairness in machine learning (MultiFair)
We propose a generic end-to-end algorithmic framework to solve it.
Our proposed framework is generalizable to many different settings.
arXiv Detail & Related papers (2021-05-24T02:30:22Z) - Through the Data Management Lens: Experimental Analysis and Evaluation
of Fair Classification [75.49600684537117]
Data management research is showing an increasing presence and interest in topics related to data and algorithmic fairness.
We contribute a broad analysis of 13 fair classification approaches and additional variants, over their correctness, fairness, efficiency, scalability, and stability.
Our analysis highlights novel insights on the impact of different metrics and high-level approach characteristics on different aspects of performance.
arXiv Detail & Related papers (2021-01-18T22:55:40Z) - A Note on Data Biases in Generative Models [16.86600007830682]
We investigate the impact of dataset quality on the performance of generative models.
We show how societal biases of datasets are replicated by generative models.
We present creative applications through unpaired transfer between diverse datasets such as photographs, oil portraits, and animes.
arXiv Detail & Related papers (2020-12-04T10:46:37Z) - Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce
Discrimination [53.3082498402884]
A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair.
We present a framework of fair semi-supervised learning in the pre-processing phase, including pseudo labeling to predict labels for unlabeled data.
A theoretical decomposition analysis of bias, variance and noise highlights the different sources of discrimination and the impact they have on fairness in semi-supervised learning.
arXiv Detail & Related papers (2020-09-25T05:48:56Z) - Bringing the People Back In: Contesting Benchmark Machine Learning
Datasets [11.00769651520502]
We outline a research program - a genealogy of machine learning data - for investigating how and why these datasets have been created.
We describe the ways in which benchmark datasets in machine learning operate as infrastructure and pose four research questions for these datasets.
arXiv Detail & Related papers (2020-07-14T23:22:13Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.