SoK: Privacy-Preserving Collaborative Tree-based Model Learning
- URL: http://arxiv.org/abs/2103.08987v1
- Date: Tue, 16 Mar 2021 11:24:15 GMT
- Title: SoK: Privacy-Preserving Collaborative Tree-based Model Learning
- Authors: Sylvain Chatel, Apostolos Pyrgelis, Juan Ramon Troncoso-Pastoriza,
Jean-Pierre Hubaux
- Abstract summary: We survey the literature on distributed and privacy-preserving training of tree-based models.
We systematize its knowledge based on four axes: the learning algorithm, the collaborative model, the protection mechanism, and the threat model.
We provide for the first time a framework analyzing the information leakage occurring in distributed tree-based model learning.
- Score: 5.759774832460351
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tree-based models are among the most efficient machine learning techniques
for data mining nowadays due to their accuracy, interpretability, and
simplicity. The recent orthogonal needs for more data and privacy protection
call for collaborative privacy-preserving solutions. In this work, we survey
the literature on distributed and privacy-preserving training of tree-based
models and we systematize its knowledge based on four axes: the learning
algorithm, the collaborative model, the protection mechanism, and the threat
model. We use this to identify the strengths and limitations of these works and
provide for the first time a framework analyzing the information leakage
occurring in distributed tree-based model learning.
Related papers
- Efficient Machine Unlearning via Influence Approximation [75.31015485113993]
Influence-based unlearning has emerged as a prominent approach to estimate the impact of individual training samples on model parameters without retraining.<n>This paper establishes a theoretical link between memorizing (incremental learning) and forgetting (unlearning)<n>We introduce the Influence Approximation Unlearning algorithm for efficient machine unlearning from the incremental perspective.
arXiv Detail & Related papers (2025-07-31T05:34:27Z) - TimberStrike: Dataset Reconstruction Attack Revealing Privacy Leakage in Federated Tree-Based Systems [5.9186175166428345]
We introduce TimberStrike, an optimization-based dataset reconstruction attack targeting horizontally federated tree-based models.<n>Our attack exploits the discrete nature of decision trees by using split values and decision paths to infer sensitive training data from other clients.<n>Our findings highlight the need for privacy-preserving mechanisms specifically designed for tree-based Federated Learning systems.
arXiv Detail & Related papers (2025-06-09T10:06:03Z) - Cross-Cloud Data Privacy Protection: Optimizing Collaborative Mechanisms of AI Systems by Integrating Federated Learning and LLMs [1.819979627431298]
We introduce a cross-cloud architecture in which federated learning works by aggregating model updates from decentralized nodes without exposing the original data.<n>We've further innovated by introducing a secure communication layer to ensure the privacy and integrity of model updates and training data.<n> Experimental results show that the proposed method is significantly better than the traditional federated learning model in terms of accuracy, convergence speed and data privacy protection.
arXiv Detail & Related papers (2025-05-19T16:14:27Z) - xIDS-EnsembleGuard: An Explainable Ensemble Learning-based Intrusion Detection System [7.2738577621227085]
We focus on addressing the challenges of detecting malicious attacks in networks by designing an advanced Explainable Intrusion Detection System (xIDS)
Existing machine learning and deep learning approaches have invisible limitations, such as potential biases in predictions, a lack of interpretability, and the risk of overfitting to training data.
We propose an ensemble learning technique called "EnsembleGuard" to overcome these challenges.
arXiv Detail & Related papers (2025-03-01T20:49:31Z) - A Neural Network Alternative to Tree-based Models [0.0]
We show that our models, Sparse TABular NET or sTAB-Net with attention mechanisms, are more effective than tree-based models.
They achieve better performance than post-hoc methods like SHAP.
arXiv Detail & Related papers (2024-10-23T10:50:07Z) - Enhancing User-Centric Privacy Protection: An Interactive Framework through Diffusion Models and Machine Unlearning [54.30994558765057]
The study pioneers a comprehensive privacy protection framework that safeguards image data privacy concurrently during data sharing and model publication.
We propose an interactive image privacy protection framework that utilizes generative machine learning models to modify image information at the attribute level.
Within this framework, we instantiate two modules: a differential privacy diffusion model for protecting attribute information in images and a feature unlearning algorithm for efficient updates of the trained model on the revised image dataset.
arXiv Detail & Related papers (2024-09-05T07:55:55Z) - Silver Linings in the Shadows: Harnessing Membership Inference for Machine Unlearning [7.557226714828334]
We present a novel unlearning mechanism designed to remove the impact of specific data samples from a neural network.
In achieving this goal, we crafted a novel loss function tailored to eliminate privacy-sensitive information from weights and activation values of the target model.
Our results showcase the superior performance of our approach in terms of unlearning efficacy and latency as well as the fidelity of the primary task.
arXiv Detail & Related papers (2024-07-01T00:20:26Z) - Optimized Feature Generation for Tabular Data via LLMs with Decision Tree Reasoning [53.241569810013836]
We propose a novel framework that utilizes large language models (LLMs) to identify effective feature generation rules.
We use decision trees to convey this reasoning information, as they can be easily represented in natural language.
OCTree consistently enhances the performance of various prediction models across diverse benchmarks.
arXiv Detail & Related papers (2024-06-12T08:31:34Z) - Blind Federated Learning without initial model [1.104960878651584]
Federated learning is an emerging machine learning approach that allows the construction of a model between several participants who hold their own private data.
This method is secure and privacy-preserving, suitable for training a machine learning model using sensitive data from different sources, such as hospitals.
arXiv Detail & Related papers (2024-04-24T20:10:10Z) - Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack.
When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model.
Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - Privacy-Preserving Graph Machine Learning from Data to Computation: A
Survey [67.7834898542701]
We focus on reviewing privacy-preserving techniques of graph machine learning.
We first review methods for generating privacy-preserving graph data.
Then we describe methods for transmitting privacy-preserved information.
arXiv Detail & Related papers (2023-07-10T04:30:23Z) - Towards Robust Dataset Learning [90.2590325441068]
We propose a principled, tri-level optimization to formulate the robust dataset learning problem.
Under an abstraction model that characterizes robust vs. non-robust features, the proposed method provably learns a robust dataset.
arXiv Detail & Related papers (2022-11-19T17:06:10Z) - Bounding Information Leakage in Machine Learning [26.64770573405079]
This paper investigates fundamental bounds on information leakage.
We identify and bound the success rate of the worst-case membership inference attack.
We derive bounds on the mutual information between the sensitive attributes and model parameters.
arXiv Detail & Related papers (2021-05-09T08:49:14Z) - GRAFFL: Gradient-free Federated Learning of a Bayesian Generative Model [8.87104231451079]
This paper presents the first gradient-free federated learning framework called GRAFFL.
It uses implicit information derived from each participating institution to learn posterior distributions of parameters.
We propose the GRAFFL-based Bayesian mixture model to serve as a proof-of-concept of the framework.
arXiv Detail & Related papers (2020-08-29T07:19:44Z) - E-Tree Learning: A Novel Decentralized Model Learning Framework for Edge
AI [18.53971408174349]
Edge empowered AI, namely Edge AI, has been proposed to support AI model learning and deployment at the network edge closer to the data sources.
In this paper, we propose a novel decentralized model learning approach, namely E-Tree, which makes use of a well-designed tree structure imposed on the edge devices.
arXiv Detail & Related papers (2020-08-04T13:59:29Z) - Three Approaches for Personalization with Applications to Federated
Learning [68.19709953755238]
We present a systematic learning-theoretic study of personalization.
We provide learning-theoretic guarantees and efficient algorithms for which we also demonstrate the performance.
All of our algorithms are model-agnostic and work for any hypothesis class.
arXiv Detail & Related papers (2020-02-25T01:36:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.