Fair Active Learning in Low-Data Regimes
- URL: http://arxiv.org/abs/2312.08559v1
- Date: Wed, 13 Dec 2023 23:14:55 GMT
- Title: Fair Active Learning in Low-Data Regimes
- Authors: Romain Camilleri, Andrew Wagenmaker, Jamie Morgenstern, Lalit Jain,
Kevin Jamieson
- Abstract summary: In machine learning applications, ensuring fairness is essential to avoid perpetuating social inequities.
In this work, we address the challenges of reducing bias and improving accuracy in data-scarce environments.
We introduce an innovative active learning framework that combines an exploration procedure inspired by posterior sampling with a fair classification subroutine.
We demonstrate that this framework performs effectively in very data-scarce regimes, maximizing accuracy while satisfying fairness constraints with high probability.
- Score: 22.349886628823125
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In critical machine learning applications, ensuring fairness is essential to
avoid perpetuating social inequities. In this work, we address the challenges
of reducing bias and improving accuracy in data-scarce environments, where the
cost of collecting labeled data prohibits the use of large, labeled datasets.
In such settings, active learning promises to maximize marginal accuracy gains
of small amounts of labeled data. However, existing applications of active
learning for fairness fail to deliver on this, typically requiring large
labeled datasets, or failing to ensure the desired fairness tolerance is met on
the population distribution.
To address such limitations, we introduce an innovative active learning
framework that combines an exploration procedure inspired by posterior sampling
with a fair classification subroutine. We demonstrate that this framework
performs effectively in very data-scarce regimes, maximizing accuracy while
satisfying fairness constraints with high probability. We evaluate our proposed
approach using well-established real-world benchmark datasets and compare it
against state-of-the-art methods, demonstrating its effectiveness in producing
fair models, and improvement over existing methods.
Related papers
- Learn to be Fair without Labels: a Distribution-based Learning Framework for Fair Ranking [1.8577028544235155]
We propose a distribution-based fair learning framework (DLF) that does not require labels by replacing the unavailable fairness labels with target fairness exposure distributions.
Our proposed framework achieves better fairness performance while maintaining better control over the fairness-relevance trade-off.
arXiv Detail & Related papers (2024-05-28T03:49:04Z) - Achievable Fairness on Your Data With Utility Guarantees [16.78730663293352]
In machine learning fairness, training models that minimize disparity across different sensitive groups often leads to diminished accuracy.
We present a computationally efficient approach to approximate the fairness-accuracy trade-off curve tailored to individual datasets.
We introduce a novel methodology for quantifying uncertainty in our estimates, thereby providing practitioners with a robust framework for auditing model fairness.
arXiv Detail & Related papers (2024-02-27T00:59:32Z) - Fair Few-shot Learning with Auxiliary Sets [53.30014767684218]
In many machine learning (ML) tasks, only very few labeled data samples can be collected, which can lead to inferior fairness performance.
In this paper, we define the fairness-aware learning task with limited training samples as the emphfair few-shot learning problem.
We devise a novel framework that accumulates fairness-aware knowledge across different meta-training tasks and then generalizes the learned knowledge to meta-test tasks.
arXiv Detail & Related papers (2023-08-28T06:31:37Z) - Towards Reducing Labeling Cost in Deep Object Detection [61.010693873330446]
We propose a unified framework for active learning, that considers both the uncertainty and the robustness of the detector.
Our method is able to pseudo-label the very confident predictions, suppressing a potential distribution drift.
arXiv Detail & Related papers (2021-06-22T16:53:09Z) - ORDisCo: Effective and Efficient Usage of Incremental Unlabeled Data for
Semi-supervised Continual Learning [52.831894583501395]
Continual learning assumes the incoming data are fully labeled, which might not be applicable in real applications.
We propose deep Online Replay with Discriminator Consistency (ORDisCo) to interdependently learn a classifier with a conditional generative adversarial network (GAN)
We show ORDisCo achieves significant performance improvement on various semi-supervised learning benchmark datasets for SSCL.
arXiv Detail & Related papers (2021-01-02T09:04:14Z) - Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce
Discrimination [53.3082498402884]
A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair.
We present a framework of fair semi-supervised learning in the pre-processing phase, including pseudo labeling to predict labels for unlabeled data.
A theoretical decomposition analysis of bias, variance and noise highlights the different sources of discrimination and the impact they have on fairness in semi-supervised learning.
arXiv Detail & Related papers (2020-09-25T05:48:56Z) - Group Fairness by Probabilistic Modeling with Latent Fair Decisions [36.20281545470954]
This paper studies learning fair probability distributions from biased data by explicitly modeling a latent variable that represents a hidden, unbiased label.
We aim to achieve demographic parity by enforcing certain independencies in the learned model.
We also show that group fairness guarantees are meaningful only if the distribution used to provide those guarantees indeed captures the real-world data.
arXiv Detail & Related papers (2020-09-18T19:13:23Z) - Fair Active Learning [15.313223110223941]
Active learning is a promising approach to build an accurate classifier by interactively querying an oracle within a labeling budget.
We design algorithms for fair active learning that carefully selects data points to be labeled so as to balance model accuracy and fairness.
arXiv Detail & Related papers (2020-06-20T17:00:02Z) - Causal Feature Selection for Algorithmic Fairness [61.767399505764736]
We consider fairness in the integration component of data management.
We propose an approach to identify a sub-collection of features that ensure the fairness of the dataset.
arXiv Detail & Related papers (2020-06-10T20:20:10Z) - Fair Active Learning [15.313223110223941]
It is critical that machine learning models do not propagate discrimination.
Active learning is a promising approach to build an accurate classifier by interactively querying an oracle within a labeling budget.
We design algorithms for fair active learning that carefully selects data points to be labeled so as to balance model accuracy and fairness.
arXiv Detail & Related papers (2020-01-06T22:20:02Z) - Leveraging Semi-Supervised Learning for Fairness using Neural Networks [49.604038072384995]
There has been a growing concern about the fairness of decision-making systems based on machine learning.
In this paper, we propose a semi-supervised algorithm using neural networks benefiting from unlabeled data.
The proposed model, called SSFair, exploits the information in the unlabeled data to mitigate the bias in the training data.
arXiv Detail & Related papers (2019-12-31T09:11:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.