FLEA: Provably Fair Multisource Learning from Unreliable Training Data
- URL: http://arxiv.org/abs/2106.11732v1
- Date: Tue, 22 Jun 2021 13:09:45 GMT
- Title: FLEA: Provably Fair Multisource Learning from Unreliable Training Data
- Authors: Eugenia Iofinova, Nikola Konstantinov, Christoph H. Lampert
- Abstract summary: We introduce FLEA, a filtering-based algorithm that allows the learning system to identify and suppress data sources that would have a negative impact on fairness or accuracy.
We show the effectiveness of our approach by a diverse range of experiments on multiple datasets.
- Score: 28.382147902475545
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fairness-aware learning aims at constructing classifiers that not only make
accurate predictions, but do not discriminate against specific groups. It is a
fast-growing area of machine learning with far-reaching societal impact.
However, existing fair learning methods are vulnerable to accidental or
malicious artifacts in the training data, which can cause them to unknowingly
produce unfair classifiers. In this work we address the problem of fair
learning from unreliable training data in the robust multisource setting, where
the available training data comes from multiple sources, a fraction of which
might be not representative of the true data distribution. We introduce FLEA, a
filtering-based algorithm that allows the learning system to identify and
suppress those data sources that would have a negative impact on fairness or
accuracy if they were used for training. We show the effectiveness of our
approach by a diverse range of experiments on multiple datasets. Additionally
we prove formally that, given enough data, FLEA protects the learner against
unreliable data as long as the fraction of affected data sources is less than
half.
Related papers
- The Impact of Data Distribution on Fairness and Robustness in Federated
Learning [7.209240273742224]
Federated Learning (FL) is a distributed machine learning protocol that allows a set of agents to collaboratively train a model without sharing their datasets.
In this work, we look at how variations in local data distributions affect the fairness and the properties of the trained models.
arXiv Detail & Related papers (2021-11-29T22:04:50Z) - Fairness-Aware Learning from Corrupted Data [33.52974791836553]
We consider fairness-aware learning under arbitrary data manipulations.
We show that the strength of this bias increases for learning problems with underrepresented protected groups in the data.
We prove that two natural learning algorithms achieve order-optimal guarantees in terms of both accuracy and fairness under adversarial data manipulations.
arXiv Detail & Related papers (2021-02-11T13:48:41Z) - Fairness-aware Agnostic Federated Learning [47.26747955026486]
We develop a fairness-aware agnostic federated learning framework (AgnosticFair) to deal with the challenge of unknown testing distribution.
We use kernel reweighing functions to assign a reweighing value on each training sample in both loss function and fairness constraint.
Built model can be directly applied to local sites as it guarantees fairness on local data distributions.
arXiv Detail & Related papers (2020-10-10T17:58:20Z) - Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce
Discrimination [53.3082498402884]
A growing specter in the rise of machine learning is whether the decisions made by machine learning models are fair.
We present a framework of fair semi-supervised learning in the pre-processing phase, including pseudo labeling to predict labels for unlabeled data.
A theoretical decomposition analysis of bias, variance and noise highlights the different sources of discrimination and the impact they have on fairness in semi-supervised learning.
arXiv Detail & Related papers (2020-09-25T05:48:56Z) - A Principled Approach to Data Valuation for Federated Learning [73.19984041333599]
Federated learning (FL) is a popular technique to train machine learning (ML) models on decentralized data sources.
The Shapley value (SV) defines a unique payoff scheme that satisfies many desiderata for a data value notion.
This paper proposes a variant of the SV amenable to FL, which we call the federated Shapley value.
arXiv Detail & Related papers (2020-09-14T04:37:54Z) - Fairness Constraints in Semi-supervised Learning [56.48626493765908]
We develop a framework for fair semi-supervised learning, which is formulated as an optimization problem.
We theoretically analyze the source of discrimination in semi-supervised learning via bias, variance and noise decomposition.
Our method is able to achieve fair semi-supervised learning, and reach a better trade-off between accuracy and fairness than fair supervised learning.
arXiv Detail & Related papers (2020-09-14T04:25:59Z) - Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines.
Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z) - WAFFLe: Weight Anonymized Factorization for Federated Learning [88.44939168851721]
In domains where data are sensitive or private, there is great value in methods that can learn in a distributed manner without the data ever leaving the local devices.
We propose Weight Anonymized Factorization for Federated Learning (WAFFLe), an approach that combines the Indian Buffet Process with a shared dictionary of weight factors for neural networks.
arXiv Detail & Related papers (2020-08-13T04:26:31Z) - On the Sample Complexity of Adversarial Multi-Source PAC Learning [46.24794665486056]
In a single-source setting, an adversary with the power to corrupt a fixed fraction of the training data can prevent PAC-learnability.
We show that, surprisingly, the same is not true in the multi-source setting, where the adversary can arbitrarily corrupt a fixed fraction of the data sources.
Our results also show that in a cooperative learning setting sharing data with other parties has provable benefits, even if some participants are malicious.
arXiv Detail & Related papers (2020-02-24T17:19:04Z) - FR-Train: A Mutual Information-Based Approach to Fair and Robust
Training [33.385118640843416]
We propose FR-Train, which holistically performs fair and robust model training.
In our experiments, FR-Train shows almost no decrease in fairness and accuracy in the presence of data poisoning.
arXiv Detail & Related papers (2020-02-24T13:37:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.