Fairkit, Fairkit, on the Wall, Who's the Fairest of Them All? Supporting
Data Scientists in Training Fair Models
- URL: http://arxiv.org/abs/2012.09951v1
- Date: Thu, 17 Dec 2020 21:59:29 GMT
- Title: Fairkit, Fairkit, on the Wall, Who's the Fairest of Them All? Supporting
Data Scientists in Training Fair Models
- Authors: Brittany Johnson, Jesse Bartola, Rico Angell, Katherine Keith, Sam
Witty, Stephen J. Giguere, Yuriy Brun
- Abstract summary: We present fairkit-learn, a toolkit for helping data scientists reason about and understand fairness.
Fairkit-learn works with state-of-the-art machine learning tools and uses the same interfaces to ease adoption.
- Score: 7.227008179076844
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Modern software relies heavily on data and machine learning, and affects
decisions that shape our world. Unfortunately, recent studies have shown that
because of biases in data, software systems frequently inject bias into their
decisions, from producing better closed caption transcriptions of men's voices
than of women's voices to overcharging people of color for financial loans. To
address bias in machine learning, data scientists need tools that help them
understand the trade-offs between model quality and fairness in their specific
data domains. Toward that end, we present fairkit-learn, a toolkit for helping
data scientists reason about and understand fairness. Fairkit-learn works with
state-of-the-art machine learning tools and uses the same interfaces to ease
adoption. It can evaluate thousands of models produced by multiple machine
learning algorithms, hyperparameters, and data permutations, and compute and
visualize a small Pareto-optimal set of models that describe the optimal
trade-offs between fairness and quality. We evaluate fairkit-learn via a user
study with 54 students, showing that students using fairkit-learn produce
models that provide a better balance between fairness and quality than students
using scikit-learn and IBM AI Fairness 360 toolkits. With fairkit-learn, users
can select models that are up to 67% more fair and 10% more accurate than the
models they are likely to train with scikit-learn.
Related papers
- Fair Knowledge Tracing in Second Language Acquisition [3.7498611358320733]
This study evaluates the fairness of two predictive models using the Duolingo dataset's en_es (English learners speaking Spanish), es_en (Spanish learners speaking English), and fr_en (French learners speaking English) tracks.
Deep learning outperforms machine learning in second-language knowledge tracing due to improved accuracy and fairness.
arXiv Detail & Related papers (2024-12-23T23:47:40Z) - Fast Model Debias with Machine Unlearning [54.32026474971696]
Deep neural networks might behave in a biased manner in many real-world scenarios.
Existing debiasing methods suffer from high costs in bias labeling or model re-training.
We propose a fast model debiasing framework (FMD) which offers an efficient approach to identify, evaluate and remove biases.
arXiv Detail & Related papers (2023-10-19T08:10:57Z) - Non-Invasive Fairness in Learning through the Lens of Data Drift [88.37640805363317]
We show how to improve the fairness of Machine Learning models without altering the data or the learning algorithm.
We use a simple but key insight: the divergence of trends between different populations, and, consecutively, between a learned model and minority populations, is analogous to data drift.
We explore two strategies (model-splitting and reweighing) to resolve this drift, aiming to improve the overall conformance of models to the underlying data.
arXiv Detail & Related papers (2023-03-30T17:30:42Z) - DualFair: Fair Representation Learning at Both Group and Individual
Levels via Contrastive Self-supervision [73.80009454050858]
This work presents a self-supervised model, called DualFair, that can debias sensitive attributes like gender and race from learned representations.
Our model jointly optimize for two fairness criteria - group fairness and counterfactual fairness.
arXiv Detail & Related papers (2023-03-15T07:13:54Z) - Synthetic Model Combination: An Instance-wise Approach to Unsupervised
Ensemble Learning [92.89846887298852]
Consider making a prediction over new test data without any opportunity to learn from a training set of labelled data.
Give access to a set of expert models and their predictions alongside some limited information about the dataset used to train them.
arXiv Detail & Related papers (2022-10-11T10:20:31Z) - Fair Classification via Transformer Neural Networks: Case Study of an
Educational Domain [0.0913755431537592]
This paper presents a preliminary investigation of fairness constraint in transformer networks on Law School Student neural datasets.
We have employed fairness metrics for evaluation and check the trade-off between fairness and accuracy.
arXiv Detail & Related papers (2022-06-03T06:34:16Z) - Distill on the Go: Online knowledge distillation in self-supervised
learning [1.1470070927586016]
Recent works have shown that wider and deeper models benefit more from self-supervised learning than smaller models.
We propose Distill-on-the-Go (DoGo), a self-supervised learning paradigm using single-stage online knowledge distillation.
Our results show significant performance gain in the presence of noisy and limited labels.
arXiv Detail & Related papers (2021-04-20T09:59:23Z) - Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce
Discrimination [53.3082498402884]
A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair.
We present a framework of fair semi-supervised learning in the pre-processing phase, including pseudo labeling to predict labels for unlabeled data.
A theoretical decomposition analysis of bias, variance and noise highlights the different sources of discrimination and the impact they have on fairness in semi-supervised learning.
arXiv Detail & Related papers (2020-09-25T05:48:56Z) - Fairness Constraints in Semi-supervised Learning [56.48626493765908]
We develop a framework for fair semi-supervised learning, which is formulated as an optimization problem.
We theoretically analyze the source of discrimination in semi-supervised learning via bias, variance and noise decomposition.
Our method is able to achieve fair semi-supervised learning, and reach a better trade-off between accuracy and fairness than fair supervised learning.
arXiv Detail & Related papers (2020-09-14T04:25:59Z) - Do the Machine Learning Models on a Crowd Sourced Platform Exhibit Bias?
An Empirical Study on Model Fairness [7.673007415383724]
We have created a benchmark of 40 top-rated models from Kaggle used for 5 different tasks.
We have applied 7 mitigation techniques on these models and analyzed the fairness, mitigation results, and impacts on performance.
arXiv Detail & Related papers (2020-05-21T23:35:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.