A Principled Approach to Data Valuation for Federated Learning
- URL: http://arxiv.org/abs/2009.06192v1
- Date: Mon, 14 Sep 2020 04:37:54 GMT
- Title: A Principled Approach to Data Valuation for Federated Learning
- Authors: Tianhao Wang, Johannes Rausch, Ce Zhang, Ruoxi Jia, Dawn Song
- Abstract summary: Federated learning (FL) is a popular technique to train machine learning (ML) models on decentralized data sources.
The Shapley value (SV) defines a unique payoff scheme that satisfies many desiderata for a data value notion.
This paper proposes a variant of the SV amenable to FL, which we call the federated Shapley value.
- Score: 73.19984041333599
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated learning (FL) is a popular technique to train machine learning (ML)
models on decentralized data sources. In order to sustain long-term
participation of data owners, it is important to fairly appraise each data
source and compensate data owners for their contribution to the training
process. The Shapley value (SV) defines a unique payoff scheme that satisfies
many desiderata for a data value notion. It has been increasingly used for
valuing training data in centralized learning. However, computing the SV
requires exhaustively evaluating the model performance on every subset of data
sources, which incurs prohibitive communication cost in the federated setting.
Besides, the canonical SV ignores the order of data sources during training,
which conflicts with the sequential nature of FL. This paper proposes a variant
of the SV amenable to FL, which we call the federated Shapley value. The
federated SV preserves the desirable properties of the canonical SV while it
can be calculated without incurring extra communication cost and is also able
to capture the effect of participation order on data value. We conduct a
thorough empirical study of the federated SV on a range of tasks, including
noisy label detection, adversarial participant detection, and data
summarization on different benchmark datasets, and demonstrate that it can
reflect the real utility of data sources for FL and has the potential to
enhance system robustness, security, and efficiency. We also report and analyze
"failure cases" and hope to stimulate future research.
Related papers
- StatAvg: Mitigating Data Heterogeneity in Federated Learning for Intrusion Detection Systems [22.259297167311964]
Federated learning (FL) is a decentralized learning technique that enables devices to collaboratively build a shared Machine Leaning (ML) or Deep Learning (DL) model without revealing their raw data to a third party.
Due to its privacy-preserving nature, FL has sparked widespread attention for building Intrusion Detection Systems (IDS) within the realm of cybersecurity.
We propose an effective method called Statistical Averaging (StatAvg) to alleviate non-independently and identically (non-iid) distributed features across local clients' data in FL.
arXiv Detail & Related papers (2024-05-20T14:41:59Z) - A Bargaining-based Approach for Feature Trading in Vertical Federated
Learning [54.51890573369637]
We propose a bargaining-based feature trading approach in Vertical Federated Learning (VFL) to encourage economically efficient transactions.
Our model incorporates performance gain-based pricing, taking into account the revenue-based optimization objectives of both parties.
arXiv Detail & Related papers (2024-02-23T10:21:07Z) - Data Valuation and Detections in Federated Learning [4.899818550820576]
Federated Learning (FL) enables collaborative model training while preserving the privacy of raw data.
A challenge in this framework is the fair and efficient valuation of data, which is crucial for incentivizing clients to contribute high-quality data in the FL task.
This paper introduces a novel privacy-preserving method for evaluating client contributions and selecting relevant datasets without a pre-specified training algorithm in an FL task.
arXiv Detail & Related papers (2023-11-09T12:01:32Z) - Analysis and Optimization of Wireless Federated Learning with Data
Heterogeneity [72.85248553787538]
This paper focuses on performance analysis and optimization for wireless FL, considering data heterogeneity, combined with wireless resource allocation.
We formulate the loss function minimization problem, under constraints on long-term energy consumption and latency, and jointly optimize client scheduling, resource allocation, and the number of local training epochs (CRE)
Experiments on real-world datasets demonstrate that the proposed algorithm outperforms other benchmarks in terms of the learning accuracy and energy consumption.
arXiv Detail & Related papers (2023-08-04T04:18:01Z) - Federated Learning for Predictive Maintenance and Quality Inspection in
Industrial Applications [0.36855408155998204]
Federated learning (FL) enables multiple participants to develop a machine learning model without compromising privacy and confidentiality of their data.
We evaluate the performance of different FL aggregation methods and compare them to central and local training approaches.
We introduce a new federated learning dataset from a real-world quality inspection setting.
arXiv Detail & Related papers (2023-04-21T16:11:09Z) - Integrating Local Real Data with Global Gradient Prototypes for
Classifier Re-Balancing in Federated Long-Tailed Learning [60.41501515192088]
Federated Learning (FL) has become a popular distributed learning paradigm that involves multiple clients training a global model collaboratively.
The data samples usually follow a long-tailed distribution in the real world, and FL on the decentralized and long-tailed data yields a poorly-behaved global model.
In this work, we integrate the local real data with the global gradient prototypes to form the local balanced datasets.
arXiv Detail & Related papers (2023-01-25T03:18:10Z) - Rethinking Data Heterogeneity in Federated Learning: Introducing a New
Notion and Standard Benchmarks [65.34113135080105]
We show that not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants.
Our observations are intuitive.
Our code is available at https://github.com/MMorafah/FL-SC-NIID.
arXiv Detail & Related papers (2022-09-30T17:15:19Z) - Fair and efficient contribution valuation for vertical federated
learning [49.50442779626123]
Federated learning is a popular technology for training machine learning models on distributed data sources without sharing data.
The Shapley value (SV) is a provably fair contribution valuation metric originated from cooperative game theory.
We propose a contribution valuation metric called vertical federated Shapley value (VerFedSV) based on SV.
arXiv Detail & Related papers (2022-01-07T19:57:15Z) - Improving Fairness for Data Valuation in Federated Learning [39.61504568047234]
We propose a new measure called completed federated Shapley value to improve the fairness of federated Shapley value.
It is shown under mild conditions that this matrix is approximately low-rank by leveraging concepts and tools from optimization.
arXiv Detail & Related papers (2021-09-19T02:39:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.