Effective and Efficient Federated Tree Learning on Hybrid Data
- URL: http://arxiv.org/abs/2310.11865v2
- Date: Mon, 29 Apr 2024 21:44:18 GMT
- Title: Effective and Efficient Federated Tree Learning on Hybrid Data
- Authors: Qinbin Li, Chulin Xie, Xiaojun Xu, Xiaoyuan Liu, Ce Zhang, Bo Li, Bingsheng He, Dawn Song,
- Abstract summary: We propose HybridTree, a novel federated learning approach that enables federated tree learning on hybrid data.
We observe the existence of consistent split rules in trees and show that the knowledge of parties can be incorporated into the lower layers of a tree.
Our experiments demonstrate that HybridTree can achieve comparable accuracy to the centralized setting with low computational and communication overhead.
- Score: 80.31870543351918
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated learning has emerged as a promising distributed learning paradigm that facilitates collaborative learning among multiple parties without transferring raw data. However, most existing federated learning studies focus on either horizontal or vertical data settings, where the data of different parties are assumed to be from the same feature or sample space. In practice, a common scenario is the hybrid data setting, where data from different parties may differ both in the features and samples. To address this, we propose HybridTree, a novel federated learning approach that enables federated tree learning on hybrid data. We observe the existence of consistent split rules in trees. With the help of these split rules, we theoretically show that the knowledge of parties can be incorporated into the lower layers of a tree. Based on our theoretical analysis, we propose a layer-level solution that does not need frequent communication traffic to train a tree. Our experiments demonstrate that HybridTree can achieve comparable accuracy to the centralized setting with low computational and communication overhead. HybridTree can achieve up to 8 times speedup compared with the other baselines.
Related papers
- Soft Hoeffding Tree: A Transparent and Differentiable Model on Data Streams [2.6524539020042663]
Stream mining algorithms such as Hoeffding trees grow based on the incoming data stream.
We propose soft Hoeffding trees (SoHoT) as a new differentiable and transparent model for possibly infinite and changing data streams.
arXiv Detail & Related papers (2024-11-07T15:49:53Z) - Adaptive Parameterization of Deep Learning Models for Federated Learning [85.82002651944254]
Federated Learning offers a way to train deep neural networks in a distributed fashion.
It incurs a communication overhead as the model parameters or gradients need to be exchanged regularly during training.
In this paper, we propose to utilise parallel Adapters for Federated Learning.
arXiv Detail & Related papers (2023-02-06T17:30:33Z) - A Fair and Efficient Hybrid Federated Learning Framework based on
XGBoost for Distributed Power Prediction [11.2804988081885]
We propose a hybrid federated learning framework based on XGBoost, for distributed power prediction from real-time external features.
In addition to introducing boosted trees to improve accuracy and interpretability, we combine horizontal and vertical federated learning.
The advantages of the proposed framework in fairness, efficiency and accuracy performance are also confirmed.
arXiv Detail & Related papers (2022-01-08T07:25:54Z) - Fair and efficient contribution valuation for vertical federated
learning [49.50442779626123]
Federated learning is a popular technology for training machine learning models on distributed data sources without sharing data.
The Shapley value (SV) is a provably fair contribution valuation metric originated from cooperative game theory.
We propose a contribution valuation metric called vertical federated Shapley value (VerFedSV) based on SV.
arXiv Detail & Related papers (2022-01-07T19:57:15Z) - A Coupled Design of Exploiting Record Similarity for Practical Vertical
Federated Learning [47.77625754666018]
Federated learning is a learning paradigm to enable collaborative learning across different parties without revealing raw data.
Most existing studies in vertical federated learning disregard the "record linkage" process.
We design a novel coupled training paradigm, FedSim, that integrates one-to-many linkage into the training process.
arXiv Detail & Related papers (2021-06-11T11:09:53Z) - Fed-EINI: An Efficient and Interpretable Inference Framework for
Decision Tree Ensembles in Federated Learning [11.843365055516566]
Fed-EINI is an efficient and interpretable inference framework for federated decision tree models.
We propose to protect the decision path by the efficient additively homomorphic encryption method.
Experiments show that the inference efficiency is improved by over $50%$ in average.
arXiv Detail & Related papers (2021-05-20T06:40:05Z) - Growing Deep Forests Efficiently with Soft Routing and Learned
Connectivity [79.83903179393164]
This paper further extends the deep forest idea in several important aspects.
We employ a probabilistic tree whose nodes make probabilistic routing decisions, a.k.a., soft routing, rather than hard binary decisions.
Experiments on the MNIST dataset demonstrate that our empowered deep forests can achieve better or comparable performance than [1],[3].
arXiv Detail & Related papers (2020-12-29T18:05:05Z) - Hybrid Federated Learning: Algorithms and Implementation [61.0640216394349]
Federated learning (FL) is a recently proposed distributed machine learning paradigm dealing with distributed and private data sets.
We propose a new model-matching-based problem formulation for hybrid FL.
We then propose an efficient algorithm that can collaboratively train the global and local models to deal with full and partial featured data.
arXiv Detail & Related papers (2020-12-22T23:56:03Z) - Adaptive Histogram-Based Gradient Boosted Trees for Federated Learning [10.893840244877568]
Federated Learning (FL) is an approach to collaboratively train a model across multiple parties without sharing data between parties or an aggregator.
It is used both in the consumer domain to protect personal data as well as in enterprise settings, where dealing with data domicile regulation and the pragmatics of data silos are the main drivers.
We propose a novel implementation of gradient boosting which utilizes a party adaptive histogram aggregation method, without the need for data encryption.
arXiv Detail & Related papers (2020-12-11T23:01:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.