Improving Fairness for Data Valuation in Federated Learning
- URL: http://arxiv.org/abs/2109.09046v1
- Date: Sun, 19 Sep 2021 02:39:59 GMT
- Title: Improving Fairness for Data Valuation in Federated Learning
- Authors: Zhenan Fan, Huang Fang, Zirui Zhou, Jian Pei, Michael P. Friedlander,
Changxin Liu, Yong Zhang
- Abstract summary: We propose a new measure called completed federated Shapley value to improve the fairness of federated Shapley value.
It is shown under mild conditions that this matrix is approximately low-rank by leveraging concepts and tools from optimization.
- Score: 39.61504568047234
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated learning is an emerging decentralized machine learning scheme that
allows multiple data owners to work collaboratively while ensuring data
privacy. The success of federated learning depends largely on the participation
of data owners. To sustain and encourage data owners' participation, it is
crucial to fairly evaluate the quality of the data provided by the data owners
and reward them correspondingly. Federated Shapley value, recently proposed by
Wang et al. [Federated Learning, 2020], is a measure for data value under the
framework of federated learning that satisfies many desired properties for data
valuation. However, there are still factors of potential unfairness in the
design of federated Shapley value because two data owners with the same local
data may not receive the same evaluation. We propose a new measure called
completed federated Shapley value to improve the fairness of federated Shapley
value. The design depends on completing a matrix consisting of all the possible
contributions by different subsets of the data owners. It is shown under mild
conditions that this matrix is approximately low-rank by leveraging concepts
and tools from optimization. Both theoretical analysis and empirical evaluation
verify that the proposed measure does improve fairness in many circumstances.
Related papers
- Targeted Learning for Data Fairness [52.59573714151884]
We expand fairness inference by evaluating fairness in the data generating process itself.
We derive estimators demographic parity, equal opportunity, and conditional mutual information.
To validate our approach, we perform several simulations and apply our estimators to real data.
arXiv Detail & Related papers (2025-02-06T18:51:28Z) - Data Acquisition for Improving Model Fairness using Reinforcement Learning [3.3916160303055563]
We focus on the task of acquiring additional labeled data points for training the downstream machine learning model to rapidly improve its fairness.
We present DataSift, a data acquisition framework based on the idea of data valuation that relies on partitioning and multi-armed bandits to determine the most valuable data points to acquire.
We empirically evaluate DataSift on several real-world and synthetic datasets and show that the fairness of a machine learning model can be significantly improved even while acquiring a few data points.
arXiv Detail & Related papers (2024-12-04T03:56:54Z) - Shapley Value on Probabilistic Classifiers [6.163093930860032]
In the context of machine learning (ML), data valuation methods aim to equitably measure the contribution of each data point to the utility of an ML model.
Traditional Shapley-based data valuation methods may not effectively distinguish between beneficial and detrimental training data points.
We propose Probabilistic Shapley (P-Shapley) value by constructing a probability-wise utility function.
arXiv Detail & Related papers (2023-06-12T15:09:13Z) - Rethinking Data Heterogeneity in Federated Learning: Introducing a New
Notion and Standard Benchmarks [65.34113135080105]
We show that not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants.
Our observations are intuitive.
Our code is available at https://github.com/MMorafah/FL-SC-NIID.
arXiv Detail & Related papers (2022-09-30T17:15:19Z) - Fair and efficient contribution valuation for vertical federated
learning [49.50442779626123]
Federated learning is a popular technology for training machine learning models on distributed data sources without sharing data.
The Shapley value (SV) is a provably fair contribution valuation metric originated from cooperative game theory.
We propose a contribution valuation metric called vertical federated Shapley value (VerFedSV) based on SV.
arXiv Detail & Related papers (2022-01-07T19:57:15Z) - GTG-Shapley: Efficient and Accurate Participant Contribution Evaluation
in Federated Learning [25.44023017628766]
Federated Learning (FL) bridges the gap between collaborative machine learning and preserving data privacy.
It is essential to fairly evaluate participants' contribution to the performance of the final FL model without exposing their private data.
We propose the Guided Truncation Gradient Shapley approach to address this challenge.
arXiv Detail & Related papers (2021-09-05T12:17:00Z) - A Principled Approach to Data Valuation for Federated Learning [73.19984041333599]
Federated learning (FL) is a popular technique to train machine learning (ML) models on decentralized data sources.
The Shapley value (SV) defines a unique payoff scheme that satisfies many desiderata for a data value notion.
This paper proposes a variant of the SV amenable to FL, which we call the federated Shapley value.
arXiv Detail & Related papers (2020-09-14T04:37:54Z) - Towards Efficient Data Valuation Based on the Shapley Value [65.4167993220998]
We study the problem of data valuation by utilizing the Shapley value.
The Shapley value defines a unique payoff scheme that satisfies many desiderata for the notion of data value.
We propose a repertoire of efficient algorithms for approximating the Shapley value.
arXiv Detail & Related papers (2019-02-27T00:22:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.