Related papers: A Principled Approach to Data Valuation for Federated Learning

A Principled Approach to Data Valuation for Federated Learning

URL: http://arxiv.org/abs/2009.06192v1
Date: Mon, 14 Sep 2020 04:37:54 GMT
Title: A Principled Approach to Data Valuation for Federated Learning
Authors: Tianhao Wang, Johannes Rausch, Ce Zhang, Ruoxi Jia, Dawn Song
Abstract summary: Federated learning (FL) is a popular technique to train machine learning (ML) models on decentralized data sources. The Shapley value (SV) defines a unique payoff scheme that satisfies many desiderata for a data value notion. This paper proposes a variant of the SV amenable to FL, which we call the federated Shapley value.
Score: 73.19984041333599
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Federated learning (FL) is a popular technique to train machine learning (ML) models on decentralized data sources. In order to sustain long-term participation of data owners, it is important to fairly appraise each data source and compensate data owners for their contribution to the training process. The Shapley value (SV) defines a unique payoff scheme that satisfies many desiderata for a data value notion. It has been increasingly used for valuing training data in centralized learning. However, computing the SV requires exhaustively evaluating the model performance on every subset of data sources, which incurs prohibitive communication cost in the federated setting. Besides, the canonical SV ignores the order of data sources during training, which conflicts with the sequential nature of FL. This paper proposes a variant of the SV amenable to FL, which we call the federated Shapley value. The federated SV preserves the desirable properties of the canonical SV while it can be calculated without incurring extra communication cost and is also able to capture the effect of participation order on data value. We conduct a thorough empirical study of the federated SV on a range of tasks, including noisy label detection, adversarial participant detection, and data summarization on different benchmark datasets, and demonstrate that it can reflect the real utility of data sources for FL and has the potential to enhance system robustness, security, and efficiency. We also report and analyze "failure cases" and hope to stimulate future research.

Related papers

StatAvg: Mitigating Data Heterogeneity in Federated Learning for Intrusion Detection Systems [22.259297167311964]
Federated learning (FL) is a decentralized learning technique that enables devices to collaboratively build a shared Machine Leaning (ML) or Deep Learning (DL) model without revealing their raw data to a third party. Due to its privacy-preserving nature, FL has sparked widespread attention for building Intrusion Detection Systems (IDS) within the realm of cybersecurity. We propose an effective method called Statistical Averaging (StatAvg) to alleviate non-independently and identically (non-iid) distributed features across local clients' data in FL.
arXiv Detail & Related papers (2024-05-20T14:41:59Z)
A Bargaining-based Approach for Feature Trading in Vertical Federated Learning [54.51890573369637]
We propose a bargaining-based feature trading approach in Vertical Federated Learning (VFL) to encourage economically efficient transactions. Our model incorporates performance gain-based pricing, taking into account the revenue-based optimization objectives of both parties.
arXiv Detail & Related papers (2024-02-23T10:21:07Z)
Data Valuation and Detections in Federated Learning [4.899818550820576]
Federated Learning (FL) enables collaborative model training while preserving the privacy of raw data. A challenge in this framework is the fair and efficient valuation of data, which is crucial for incentivizing clients to contribute high-quality data in the FL task. This paper introduces a novel privacy-preserving method for evaluating client contributions and selecting relevant datasets without a pre-specified training algorithm in an FL task.
arXiv Detail & Related papers (2023-11-09T12:01:32Z)
Analysis and Optimization of Wireless Federated Learning with Data Heterogeneity [72.85248553787538]
This paper focuses on performance analysis and optimization for wireless FL, considering data heterogeneity, combined with wireless resource allocation. We formulate the loss function minimization problem, under constraints on long-term energy consumption and latency, and jointly optimize client scheduling, resource allocation, and the number of local training epochs (CRE) Experiments on real-world datasets demonstrate that the proposed algorithm outperforms other benchmarks in terms of the learning accuracy and energy consumption.
arXiv Detail & Related papers (2023-08-04T04:18:01Z)
Federated Learning for Predictive Maintenance and Quality Inspection in Industrial Applications [0.36855408155998204]
Federated learning (FL) enables multiple participants to develop a machine learning model without compromising privacy and confidentiality of their data. We evaluate the performance of different FL aggregation methods and compare them to central and local training approaches. We introduce a new federated learning dataset from a real-world quality inspection setting.
arXiv Detail & Related papers (2023-04-21T16:11:09Z)
Integrating Local Real Data with Global Gradient Prototypes for Classifier Re-Balancing in Federated Long-Tailed Learning [60.41501515192088]
Federated Learning (FL) has become a popular distributed learning paradigm that involves multiple clients training a global model collaboratively. The data samples usually follow a long-tailed distribution in the real world, and FL on the decentralized and long-tailed data yields a poorly-behaved global model. In this work, we integrate the local real data with the global gradient prototypes to form the local balanced datasets.
arXiv Detail & Related papers (2023-01-25T03:18:10Z)
Rethinking Data Heterogeneity in Federated Learning: Introducing a New Notion and Standard Benchmarks [65.34113135080105]
We show that not only the issue of data heterogeneity in current setups is not necessarily a problem but also in fact it can be beneficial for the FL participants. Our observations are intuitive. Our code is available at https://github.com/MMorafah/FL-SC-NIID.
arXiv Detail & Related papers (2022-09-30T17:15:19Z)
Fair and efficient contribution valuation for vertical federated learning [49.50442779626123]
Federated learning is a popular technology for training machine learning models on distributed data sources without sharing data. The Shapley value (SV) is a provably fair contribution valuation metric originated from cooperative game theory. We propose a contribution valuation metric called vertical federated Shapley value (VerFedSV) based on SV.
arXiv Detail & Related papers (2022-01-07T19:57:15Z)
Improving Fairness for Data Valuation in Federated Learning [39.61504568047234]
We propose a new measure called completed federated Shapley value to improve the fairness of federated Shapley value. It is shown under mild conditions that this matrix is approximately low-rank by leveraging concepts and tools from optimization.
arXiv Detail & Related papers (2021-09-19T02:39:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.