Data Valuation for Vertical Federated Learning: A Model-free and
Privacy-preserving Method
- URL: http://arxiv.org/abs/2112.08364v3
- Date: Thu, 4 Jan 2024 07:19:17 GMT
- Title: Data Valuation for Vertical Federated Learning: A Model-free and
Privacy-preserving Method
- Authors: Xiao Han and Leye Wang and Junjie Wu and Xiao Fang
- Abstract summary: FedValue is a privacy-preserving, task-specific but model-free data valuation method for Vertical Federated learning (VFL)
We first introduce a novel data valuation metric, namely MShapley-CMI. The metric evaluates a data party's contribution to a predictive analytics task without the need of executing a machine learning model.
Next, we develop an innovative federated method that calculates the MShapley-CMI value for each data party in a privacy-preserving manner.
- Score: 14.451118953357605
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vertical Federated learning (VFL) is a promising paradigm for predictive
analytics, empowering an organization (i.e., task party) to enhance its
predictive models through collaborations with multiple data suppliers (i.e.,
data parties) in a decentralized and privacy-preserving way. Despite the
fast-growing interest in VFL, the lack of effective and secure tools for
assessing the value of data owned by data parties hinders the application of
VFL in business contexts. In response, we propose FedValue, a
privacy-preserving, task-specific but model-free data valuation method for VFL,
which consists of a data valuation metric and a federated computation method.
Specifically, we first introduce a novel data valuation metric, namely
MShapley-CMI. The metric evaluates a data party's contribution to a predictive
analytics task without the need of executing a machine learning model, making
it well-suited for real-world applications of VFL. Next, we develop an
innovative federated computation method that calculates the MShapley-CMI value
for each data party in a privacy-preserving manner. Extensive experiments
conducted on six public datasets validate the efficacy of FedValue for data
valuation in the context of VFL. In addition, we illustrate the practical
utility of FedValue with a case study involving federated movie
recommendations.
Related papers
- A Survey on Contribution Evaluation in Vertical Federated Learning [26.32678862011122]
Vertical Federated Learning (VFL) has emerged as a critical approach in machine learning to address privacy concerns.
This paper provides a review of contribution evaluation in VFL.
We explore various tasks in VFL that involving contribution evaluation and analyze their required evaluation properties.
arXiv Detail & Related papers (2024-05-03T06:32:07Z) - A Bargaining-based Approach for Feature Trading in Vertical Federated
Learning [54.51890573369637]
We propose a bargaining-based feature trading approach in Vertical Federated Learning (VFL) to encourage economically efficient transactions.
Our model incorporates performance gain-based pricing, taking into account the revenue-based optimization objectives of both parties.
arXiv Detail & Related papers (2024-02-23T10:21:07Z) - Data Valuation and Detections in Federated Learning [4.899818550820576]
Federated Learning (FL) enables collaborative model training while preserving the privacy of raw data.
A challenge in this framework is the fair and efficient valuation of data, which is crucial for incentivizing clients to contribute high-quality data in the FL task.
This paper introduces a novel privacy-preserving method for evaluating client contributions and selecting relevant datasets without a pre-specified training algorithm in an FL task.
arXiv Detail & Related papers (2023-11-09T12:01:32Z) - Personalized Federated Learning under Mixture of Distributions [98.25444470990107]
We propose a novel approach to Personalized Federated Learning (PFL), which utilizes Gaussian mixture models (GMM) to fit the input data distributions across diverse clients.
FedGMM possesses an additional advantage of adapting to new clients with minimal overhead, and it also enables uncertainty quantification.
Empirical evaluations on synthetic and benchmark datasets demonstrate the superior performance of our method in both PFL classification and novel sample detection.
arXiv Detail & Related papers (2023-05-01T20:04:46Z) - FederatedTrust: A Solution for Trustworthy Federated Learning [3.202927443898192]
The rapid expansion of the Internet of Things (IoT) has presented challenges for centralized Machine and Deep Learning (ML/DL) methods.
To address concerns regarding data privacy, collaborative and privacy-preserving ML/DL techniques like Federated Learning (FL) have emerged.
arXiv Detail & Related papers (2023-02-20T09:02:24Z) - Do Gradient Inversion Attacks Make Federated Learning Unsafe? [70.0231254112197]
Federated learning (FL) allows the collaborative training of AI models without needing to share raw data.
Recent works on the inversion of deep neural networks from model gradients raised concerns about the security of FL in preventing the leakage of training data.
In this work, we show that these attacks presented in the literature are impractical in real FL use-cases and provide a new baseline attack.
arXiv Detail & Related papers (2022-02-14T18:33:12Z) - Fair and efficient contribution valuation for vertical federated
learning [49.50442779626123]
Federated learning is a popular technology for training machine learning models on distributed data sources without sharing data.
The Shapley value (SV) is a provably fair contribution valuation metric originated from cooperative game theory.
We propose a contribution valuation metric called vertical federated Shapley value (VerFedSV) based on SV.
arXiv Detail & Related papers (2022-01-07T19:57:15Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z) - A Principled Approach to Data Valuation for Federated Learning [73.19984041333599]
Federated learning (FL) is a popular technique to train machine learning (ML) models on decentralized data sources.
The Shapley value (SV) defines a unique payoff scheme that satisfies many desiderata for a data value notion.
This paper proposes a variant of the SV amenable to FL, which we call the federated Shapley value.
arXiv Detail & Related papers (2020-09-14T04:37:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.