A Federated Random Forest Solution for Secure Distributed Machine Learning
- URL: http://arxiv.org/abs/2505.08085v1
- Date: Mon, 12 May 2025 21:40:35 GMT
- Title: A Federated Random Forest Solution for Secure Distributed Machine Learning
- Authors: Alexandre Cotorobai, Jorge Miguel Silva, Jose Luis Oliveira,
- Abstract summary: This paper introduces a federated learning framework for Random Forest classifiers that preserves data privacy and provides robust performance in distributed settings.<n>By leveraging PySyft for secure, privacy-aware computation, our method enables multiple institutions to collaboratively train Random Forest models on locally stored data.<n>Experiments on two real-world healthcare benchmarks demonstrate that the federated approach maintains competitive accuracy - within a maximum 9% margin of centralized methods.
- Score: 44.99833362998488
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Privacy and regulatory barriers often hinder centralized machine learning solutions, particularly in sectors like healthcare where data cannot be freely shared. Federated learning has emerged as a powerful paradigm to address these concerns; however, existing frameworks primarily support gradient-based models, leaving a gap for more interpretable, tree-based approaches. This paper introduces a federated learning framework for Random Forest classifiers that preserves data privacy and provides robust performance in distributed settings. By leveraging PySyft for secure, privacy-aware computation, our method enables multiple institutions to collaboratively train Random Forest models on locally stored data without exposing sensitive information. The framework supports weighted model averaging to account for varying data distributions, incremental learning to progressively refine models, and local evaluation to assess performance across heterogeneous datasets. Experiments on two real-world healthcare benchmarks demonstrate that the federated approach maintains competitive predictive accuracy - within a maximum 9\% margin of centralized methods - while satisfying stringent privacy requirements. These findings underscore the viability of tree-based federated learning for scenarios where data cannot be centralized due to regulatory, competitive, or technical constraints. The proposed solution addresses a notable gap in existing federated learning libraries, offering an adaptable tool for secure distributed machine learning tasks that demand both transparency and reliable performance. The tool is available at https://github.com/ieeta-pt/fed_rf.
Related papers
- Cellular Traffic Prediction via Byzantine-robust Asynchronous Federated Learning [10.584332226400676]
Network traffic prediction plays a crucial role in intelligent network operation.<n>Traditional prediction methods often rely on centralized training, necessitating the transfer of vast amounts of traffic data to a central server.<n>To address these issues, federated learning integrated with differential privacy has emerged as a solution to improve data privacy and model robustness in distributed settings.
arXiv Detail & Related papers (2025-05-25T18:38:57Z) - Benchmarking FedAvg and FedCurv for Image Classification Tasks [1.376408511310322]
This paper focuses on the problem of statistical heterogeneity of the data in the same federated network.
Several Federated Learning algorithms, such as FedAvg, FedProx and Federated Curvature (FedCurv) have already been proposed.
As a side product of this work, we release the non-IID version of the datasets we used so to facilitate further comparisons from the FL community.
arXiv Detail & Related papers (2023-03-31T10:13:01Z) - FedILC: Weighted Geometric Mean and Invariant Gradient Covariance for
Federated Learning on Non-IID Data [69.0785021613868]
Federated learning is a distributed machine learning approach which enables a shared server model to learn by aggregating the locally-computed parameter updates with the training data from spatially-distributed client silos.
We propose the Federated Invariant Learning Consistency (FedILC) approach, which leverages the gradient covariance and the geometric mean of Hessians to capture both inter-silo and intra-silo consistencies.
This is relevant to various fields such as medical healthcare, computer vision, and the Internet of Things (IoT)
arXiv Detail & Related papers (2022-05-19T03:32:03Z) - Local Learning Matters: Rethinking Data Heterogeneity in Federated
Learning [61.488646649045215]
Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
arXiv Detail & Related papers (2021-11-28T19:03:39Z) - FedMix: Approximation of Mixup under Mean Augmented Federated Learning [60.503258658382]
Federated learning (FL) allows edge devices to collectively learn a model without directly sharing data within each device.
Current state-of-the-art algorithms suffer from performance degradation as the heterogeneity of local data across clients increases.
We propose a new augmentation algorithm, named FedMix, which is inspired by a phenomenal yet simple data augmentation method, Mixup.
arXiv Detail & Related papers (2021-07-01T06:14:51Z) - Weight Divergence Driven Divide-and-Conquer Approach for Optimal
Federated Learning from non-IID Data [0.0]
Federated Learning allows training of data stored in distributed devices without the need for centralizing training data.
We propose a novel Divide-and-Conquer training methodology that enables the use of the popular FedAvg aggregation algorithm.
arXiv Detail & Related papers (2021-06-28T09:34:20Z) - Robustness and Personalization in Federated Learning: A Unified Approach
via Regularization [4.7234844467506605]
We present a class of methods for robust, personalized federated learning, called Fed+.
The principal advantage of Fed+ is to better accommodate the real-world characteristics found in federated training.
We demonstrate the benefits of Fed+ through extensive experiments on benchmark datasets.
arXiv Detail & Related papers (2020-09-14T10:04:30Z) - WAFFLe: Weight Anonymized Factorization for Federated Learning [88.44939168851721]
In domains where data are sensitive or private, there is great value in methods that can learn in a distributed manner without the data ever leaving the local devices.
We propose Weight Anonymized Factorization for Federated Learning (WAFFLe), an approach that combines the Indian Buffet Process with a shared dictionary of weight factors for neural networks.
arXiv Detail & Related papers (2020-08-13T04:26:31Z) - Learning while Respecting Privacy and Robustness to Distributional
Uncertainties and Adversarial Data [66.78671826743884]
The distributionally robust optimization framework is considered for training a parametric model.
The objective is to endow the trained model with robustness against adversarially manipulated input data.
Proposed algorithms offer robustness with little overhead.
arXiv Detail & Related papers (2020-07-07T18:25:25Z) - Concentrated Differentially Private and Utility Preserving Federated
Learning [24.239992194656164]
Federated learning is a machine learning setting where a set of edge devices collaboratively train a model under the orchestration of a central server.
In this paper, we develop a federated learning approach that addresses the privacy challenge without much degradation on model utility.
We provide a tight end-to-end privacy guarantee of our approach and analyze its theoretical convergence rates.
arXiv Detail & Related papers (2020-03-30T19:20:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.