OpBoost: A Vertical Federated Tree Boosting Framework Based on
Order-Preserving Desensitization
- URL: http://arxiv.org/abs/2210.01318v1
- Date: Tue, 4 Oct 2022 02:21:18 GMT
- Title: OpBoost: A Vertical Federated Tree Boosting Framework Based on
Order-Preserving Desensitization
- Authors: Xiaochen Li, Yuke Hu, Weiran Liu, Hanwen Feng, Li Peng, Yuan Hong, Kui
Ren, Zhan Qin
- Abstract summary: Vertical Federated Learning (FL) is a new paradigm that enables users with non-overlapping attributes of the same data samples to jointly train a model without sharing the raw data.
Recent works show that it's still not sufficient to prevent privacy leakage from the training process or the trained model.
This paper focuses on studying the privacy-preserving tree boosting algorithms under the vertical FL.
- Score: 26.386265547513887
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vertical Federated Learning (FL) is a new paradigm that enables users with
non-overlapping attributes of the same data samples to jointly train a model
without directly sharing the raw data. Nevertheless, recent works show that
it's still not sufficient to prevent privacy leakage from the training process
or the trained model. This paper focuses on studying the privacy-preserving
tree boosting algorithms under the vertical FL. The existing solutions based on
cryptography involve heavy computation and communication overhead and are
vulnerable to inference attacks. Although the solution based on Local
Differential Privacy (LDP) addresses the above problems, it leads to the low
accuracy of the trained model.
This paper explores to improve the accuracy of the widely deployed tree
boosting algorithms satisfying differential privacy under vertical FL.
Specifically, we introduce a framework called OpBoost. Three order-preserving
desensitization algorithms satisfying a variant of LDP called distance-based
LDP (dLDP) are designed to desensitize the training data. In particular, we
optimize the dLDP definition and study efficient sampling distributions to
further improve the accuracy and efficiency of the proposed algorithms. The
proposed algorithms provide a trade-off between the privacy of pairs with large
distance and the utility of desensitized values. Comprehensive evaluations show
that OpBoost has a better performance on prediction accuracy of trained models
compared with existing LDP approaches on reasonable settings. Our code is open
source.
Related papers
- Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer [52.09480867526656]
We identify the source of misalignment as a form of distributional shift and uncertainty in learning human preferences.
To mitigate overoptimization, we first propose a theoretical algorithm that chooses the best policy for an adversarially chosen reward model.
Using the equivalence between reward models and the corresponding optimal policy, the algorithm features a simple objective that combines a preference optimization loss and a supervised learning loss.
arXiv Detail & Related papers (2024-05-26T05:38:50Z) - Sparse is Enough in Fine-tuning Pre-trained Large Language Models [98.46493578509039]
We propose a gradient-based sparse fine-tuning algorithm, named Sparse Increment Fine-Tuning (SIFT)
We validate its effectiveness on a range of tasks including the GLUE Benchmark and Instruction-tuning.
arXiv Detail & Related papers (2023-12-19T06:06:30Z) - Gradient Sparsification for Efficient Wireless Federated Learning with
Differential Privacy [25.763777765222358]
Federated learning (FL) enables distributed clients to collaboratively train a machine learning model without sharing raw data with each other.
As the model size grows, the training latency due to limited transmission bandwidth and private information degrades while using differential privacy (DP) protection.
We propose sparsification empowered FL framework wireless channels, in over to improve training efficiency without sacrificing convergence performance.
arXiv Detail & Related papers (2023-04-09T05:21:15Z) - The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF.
Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples.
In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z) - sqSGD: Locally Private and Communication Efficient Federated Learning [14.60645909629309]
Federated learning (FL) is a technique that trains machine learning models from decentralized data sources.
We develop a gradient-based learning algorithm called sqSGD that addresses communication efficiency and high-dimensional compatibility.
Experiment results show sqSGD successfully learns large models like LeNet and ResNet with local privacy constraints.
arXiv Detail & Related papers (2022-06-21T17:45:35Z) - Differentially Private Federated Learning via Inexact ADMM with Multiple
Local Updates [0.0]
We develop a DP inexact alternating direction method of multipliers algorithm with multiple local updates for federated learning.
We show that our algorithm provides $barepsilon$-DP for every iteration, where $barepsilon$ is a privacy budget controlled by the user.
We demonstrate that our algorithm reduces the testing error by at most $31%$ compared with the existing DP algorithm, while achieving the same level of data privacy.
arXiv Detail & Related papers (2022-02-18T19:58:47Z) - FedBoosting: Federated Learning with Gradient Protected Boosting for
Text Recognition [7.988454173034258]
Federated Learning (FL) framework allows learning a shared model collaboratively without data being centralized or shared among data owners.
We show in this paper that the generalization ability of the joint model is poor on Non-Independent and Non-Identically Distributed (Non-IID) data.
We propose a novel boosting algorithm for FL to address both the generalization and gradient leakage issues.
arXiv Detail & Related papers (2020-07-14T18:47:23Z) - FedPD: A Federated Learning Framework with Optimal Rates and Adaptivity
to Non-IID Data [59.50904660420082]
Federated Learning (FL) has become a popular paradigm for learning from distributed data.
To effectively utilize data at different devices without moving them to the cloud, algorithms such as the Federated Averaging (FedAvg) have adopted a "computation then aggregation" (CTA) model.
arXiv Detail & Related papers (2020-05-22T23:07:42Z) - User-Level Privacy-Preserving Federated Learning: Analysis and
Performance Optimization [77.43075255745389]
Federated learning (FL) is capable of preserving private data from mobile terminals (MTs) while training the data into useful models.
From a viewpoint of information theory, it is still possible for a curious server to infer private information from the shared models uploaded by MTs.
We propose a user-level differential privacy (UDP) algorithm by adding artificial noise to the shared models before uploading them to servers.
arXiv Detail & Related papers (2020-02-29T10:13:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.