SecureBoost+: Large Scale and High-Performance Vertical Federated Gradient Boosting Decision Tree
- URL: http://arxiv.org/abs/2110.10927v5
- Date: Wed, 19 Jun 2024 02:45:59 GMT
- Title: SecureBoost+: Large Scale and High-Performance Vertical Federated Gradient Boosting Decision Tree
- Authors: Tao Fan, Weijing Chen, Guoqiang Ma, Yan Kang, Lixin Fan, Qiang Yang,
- Abstract summary: We propose SecureBoost+, a large-scale and high-performance vertical federated gradient boosting decision tree framework.
SecureBoost+ is secure in the semi-honest model, which is the same as SecureBoost.
The experimental results show that SecureBoost+ is 6-35x faster than SecureBoost, but with the same accuracy and can be scaled up to tens of millions of data samples and thousands of feature dimensions.
- Score: 18.65547577691255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gradient boosting decision tree (GBDT) is an ensemble machine learning algorithm, which is widely used in industry, due to its good performance and easy interpretation. Due to the problem of data isolation and the requirement of privacy, many works try to use vertical federated learning to train machine learning models collaboratively with privacy guarantees between different data owners. SecureBoost is one of the most popular vertical federated learning algorithms for GBDT. However, in order to achieve privacy preservation, SecureBoost involves complex training procedures and time-consuming cryptography operations. This causes SecureBoost to be slow to train and does not scale to large scale data. In this work, we propose SecureBoost+, a large-scale and high-performance vertical federated gradient boosting decision tree framework. SecureBoost+ is secure in the semi-honest model, which is the same as SecureBoost. SecureBoost+ can be scaled up to tens of millions of data samples easily. SecureBoost+ achieves high performance through several novel optimizations for SecureBoost, including ciphertext operation optimization, the introduction of new training mechanisms, and multi-classification training optimization. The experimental results show that SecureBoost+ is 6-35x faster than SecureBoost, but with the same accuracy and can be scaled up to tens of millions of data samples and thousands of feature dimensions.
Related papers
- How to Boost Any Loss Function [63.573324901948716]
We show that loss functions can be efficiently optimized with boosting.
We show that boosting can achieve a feat not yet known to be possible in the classical $0th$ order setting.
arXiv Detail & Related papers (2024-07-02T14:08:23Z) - Hyperparameter Optimization for SecureBoost via Constrained Multi-Objective Federated Learning [26.00375717103131]
We find that SecureBoost and some of its variants are still vulnerable to label leakage.
This vulnerability may lead to a suboptimal trade-off between utility, privacy, and efficiency.
We propose the Constrained Multi-Objective SecureBoost (CMOSB) algorithm to approximate optimal solutions.
arXiv Detail & Related papers (2024-04-06T03:46:42Z) - SecureBoost Hyperparameter Tuning via Multi-Objective Federated Learning [23.196686101682737]
SecureBoost is a tree-boosting algorithm leveraging homomorphic encryption to protect data privacy in vertical federated learning setting.
SecureBoost suffers from high computational complexity and risk of label leakage.
We propose a Constrained Multi-Objective SecureBoost (CMOSB) algorithm to find optimal solutions.
arXiv Detail & Related papers (2023-07-20T04:45:59Z) - PromptBoosting: Black-Box Text Classification with Ten Forward Passes [61.38341243907045]
We describe PromptBoosting, a query-efficient procedure for building a text classifier from a neural language model (LM) without access to the LM's parameters, gradients, or hidden representations.
Experiments show that PromptBoosting achieves state-of-the-art performance in multiple black-box few-shot classification tasks, and matches or outperforms full fine-tuning in both few-shot and standard learning paradigms, while training 10x faster than existing black-box methods.
arXiv Detail & Related papers (2022-12-19T06:04:54Z) - SketchBoost: Fast Gradient Boosted Decision Tree for Multioutput
Problems [3.04585143845864]
Gradient Boosted Decision Tree (GBDT) is a widely-used machine learning algorithm.
We propose novel methods aiming to accelerate the training process of GBDT in the multioutput scenario.
Our numerical study demonstrates that SketchBoost speeds up the training process of GBDT by up to over 40 times.
arXiv Detail & Related papers (2022-11-23T11:06:10Z) - OpBoost: A Vertical Federated Tree Boosting Framework Based on
Order-Preserving Desensitization [26.386265547513887]
Vertical Federated Learning (FL) is a new paradigm that enables users with non-overlapping attributes of the same data samples to jointly train a model without sharing the raw data.
Recent works show that it's still not sufficient to prevent privacy leakage from the training process or the trained model.
This paper focuses on studying the privacy-preserving tree boosting algorithms under the vertical FL.
arXiv Detail & Related papers (2022-10-04T02:21:18Z) - Boosting as Frank-Wolfe [0.6875312133832078]
We propose a generic boosting scheme that combines the Frank-Wolfe algorithm and any secondary algorithm.
We show that the scheme retains the same convergence guarantee as ERLPBoost and C-ERLPBoost.
arXiv Detail & Related papers (2022-09-22T07:36:55Z) - PRBoost: Prompt-Based Rule Discovery and Boosting for Interactive
Weakly-Supervised Learning [57.66155242473784]
Weakly-supervised learning (WSL) has shown promising results in addressing label scarcity on many NLP tasks.
Our proposed model, named PRBoost, achieves this goal via iterative prompt-based rule discovery and model boosting.
Experiments on four tasks show PRBoost outperforms state-of-the-art WSL baselines up to 7.1%.
arXiv Detail & Related papers (2022-03-18T04:23:20Z) - Boosting for Online Convex Optimization [64.15578413206715]
We consider the decision-making framework of online convex optimization with a large number of experts.
We define a weak learning algorithm as a mechanism that guarantees approximate regret against a base class of experts.
We give an efficient boosting algorithm that guarantees near-optimal regret against the convex hull of the base class.
arXiv Detail & Related papers (2021-02-18T12:30:49Z) - Soft Gradient Boosting Machine [72.54062017726154]
We propose the soft Gradient Boosting Machine (sGBM) by wiring multiple differentiable base learners together.
Experimental results showed that, sGBM enjoys much higher time efficiency with better accuracy, given the same base learner in both on-line and off-line settings.
arXiv Detail & Related papers (2020-06-07T06:43:23Z) - On the Dual Formulation of Boosting Algorithms [92.74617630106559]
We show that the Lagrange problems of AdaBoost, LogitBoost and soft-marginBoost are all dual problems with generalized hinge loss entropy.
By looking at the dual problems of these boosting algorithms, we show that the success of boosting can be understood in terms of maintaining a better margin distribution.
arXiv Detail & Related papers (2009-01-23T02:14:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.