Large-Scale Secure XGB for Vertical Federated Learning
- URL: http://arxiv.org/abs/2005.08479v2
- Date: Thu, 2 Sep 2021 04:11:18 GMT
- Title: Large-Scale Secure XGB for Vertical Federated Learning
- Authors: Wenjing Fang, Derun Zhao, Jin Tan, Chaochao Chen, Chaofan Yu, Li Wang,
Lei Wang, Jun Zhou, Benyu Zhang
- Abstract summary: In this paper, we aim to build large-scale secure XGB under vertically federated learning setting.
We employ secure multi-party computation techniques to avoid leaking intermediate information during training.
By proposing secure permutation protocols, we can improve the training efficiency and make the framework scale to large dataset.
- Score: 15.864654742542246
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Privacy-preserving machine learning has drawn increasingly attention
recently, especially with kinds of privacy regulations come into force. Under
such situation, Federated Learning (FL) appears to facilitate
privacy-preserving joint modeling among multiple parties. Although many
federated algorithms have been extensively studied, there is still a lack of
secure and practical gradient tree boosting models (e.g., XGB) in literature.
In this paper, we aim to build large-scale secure XGB under vertically
federated learning setting. We guarantee data privacy from three aspects.
Specifically, (i) we employ secure multi-party computation techniques to avoid
leaking intermediate information during training, (ii) we store the output
model in a distributed manner in order to minimize information release, and
(iii) we provide a novel algorithm for secure XGB predict with the distributed
model. Furthermore, by proposing secure permutation protocols, we can improve
the training efficiency and make the framework scale to large dataset. We
conduct extensive experiments on both public datasets and real-world datasets,
and the results demonstrate that our proposed XGB models provide not only
competitive accuracy but also practical performance.
Related papers
- Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - PriRoAgg: Achieving Robust Model Aggregation with Minimum Privacy Leakage for Federated Learning [49.916365792036636]
Federated learning (FL) has recently gained significant momentum due to its potential to leverage large-scale distributed user data.
The transmitted model updates can potentially leak sensitive user information, and the lack of central control of the local training process leaves the global model susceptible to malicious manipulations on model updates.
We develop a general framework PriRoAgg, utilizing Lagrange coded computing and distributed zero-knowledge proof, to execute a wide range of robust aggregation algorithms while satisfying aggregated privacy.
arXiv Detail & Related papers (2024-07-12T03:18:08Z) - FewFedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning [54.26614091429253]
Federated instruction tuning (FedIT) is a promising solution, by consolidating collaborative training across multiple data owners.
FedIT encounters limitations such as scarcity of instructional data and risk of exposure to training data extraction attacks.
We propose FewFedPIT, designed to simultaneously enhance privacy protection and model performance of federated few-shot learning.
arXiv Detail & Related papers (2024-03-10T08:41:22Z) - Independent Distribution Regularization for Private Graph Embedding [55.24441467292359]
Graph embeddings are susceptible to attribute inference attacks, which allow attackers to infer private node attributes from the learned graph embeddings.
To address these concerns, privacy-preserving graph embedding methods have emerged.
We propose a novel approach called Private Variational Graph AutoEncoders (PVGAE) with the aid of independent distribution penalty as a regularization term.
arXiv Detail & Related papers (2023-08-16T13:32:43Z) - Can Public Large Language Models Help Private Cross-device Federated Learning? [58.05449579773249]
We study (differentially) private federated learning (FL) of language models.
Public data has been used to improve privacy-utility trade-offs for both large and small language models.
We propose a novel distribution matching algorithm with theoretical grounding to sample public data close to private data distribution.
arXiv Detail & Related papers (2023-05-20T07:55:58Z) - Federated Boosted Decision Trees with Differential Privacy [24.66980518231163]
We propose a general framework that captures and extends existing approaches for differentially private decision trees.
We show that with a careful choice of techniques it is possible to achieve very high utility while maintaining strong levels of privacy.
arXiv Detail & Related papers (2022-10-06T13:28:29Z) - sqSGD: Locally Private and Communication Efficient Federated Learning [14.60645909629309]
Federated learning (FL) is a technique that trains machine learning models from decentralized data sources.
We develop a gradient-based learning algorithm called sqSGD that addresses communication efficiency and high-dimensional compatibility.
Experiment results show sqSGD successfully learns large models like LeNet and ResNet with local privacy constraints.
arXiv Detail & Related papers (2022-06-21T17:45:35Z) - Efficient Logistic Regression with Local Differential Privacy [0.0]
Internet of Things devices are expanding rapidly and generating huge amount of data.
There is an increasing need to explore data collected from these devices.
Collaborative learning provides a strategic solution for the Internet of Things settings but also raises public concern over data privacy.
arXiv Detail & Related papers (2022-02-05T22:44:03Z) - An Efficient Learning Framework For Federated XGBoost Using Secret
Sharing And Distributed Optimization [47.70500612425959]
XGBoost is one of the most widely used machine learning models in the industry due to its superior learning accuracy and efficiency.
It is crucial to deploy a secure and efficient federated XGBoost (FedXGB) model to tackle data isolation issues in the big data problems.
In this paper, a multi-party federated XGB learning framework is proposed with a security guarantee, which reshapes the XGBoost's split criterion calculation process under a secret sharing setting.
Remarkably, a thorough analysis of model security is provided as well, and multiple numerical results showcase the superiority of the proposed FedXGB
arXiv Detail & Related papers (2021-05-12T15:04:18Z) - GRAFFL: Gradient-free Federated Learning of a Bayesian Generative Model [8.87104231451079]
This paper presents the first gradient-free federated learning framework called GRAFFL.
It uses implicit information derived from each participating institution to learn posterior distributions of parameters.
We propose the GRAFFL-based Bayesian mixture model to serve as a proof-of-concept of the framework.
arXiv Detail & Related papers (2020-08-29T07:19:44Z) - LDP-FL: Practical Private Aggregation in Federated Learning with Local
Differential Privacy [20.95527613004989]
Federated learning is a popular approach for privacy protection that collects the local gradient information instead of real data.
Previous works do not give a practical solution due to three issues.
Last, the privacy budget explodes due to the high dimensionality of weights in deep learning models.
arXiv Detail & Related papers (2020-07-31T01:08:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.