TimberStrike: Dataset Reconstruction Attack Revealing Privacy Leakage in Federated Tree-Based Systems
- URL: http://arxiv.org/abs/2506.07605v3
- Date: Sun, 13 Jul 2025 17:21:43 GMT
- Title: TimberStrike: Dataset Reconstruction Attack Revealing Privacy Leakage in Federated Tree-Based Systems
- Authors: Marco Di Gennaro, Giovanni De Lucia, Stefano Longari, Stefano Zanero, Michele Carminati,
- Abstract summary: We introduce TimberStrike, an optimization-based dataset reconstruction attack targeting horizontally federated tree-based models.<n>Our attack exploits the discrete nature of decision trees by using split values and decision paths to infer sensitive training data from other clients.<n>Our findings highlight the need for privacy-preserving mechanisms specifically designed for tree-based Federated Learning systems.
- Score: 5.9186175166428345
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Federated Learning has emerged as a privacy-oriented alternative to centralized Machine Learning, enabling collaborative model training without direct data sharing. While extensively studied for neural networks, the security and privacy implications of tree-based models remain underexplored. This work introduces TimberStrike, an optimization-based dataset reconstruction attack targeting horizontally federated tree-based models. Our attack, carried out by a single client, exploits the discrete nature of decision trees by using split values and decision paths to infer sensitive training data from other clients. We evaluate TimberStrike on State-of-the-Art federated gradient boosting implementations across multiple frameworks, including Flower, NVFlare, and FedTree, demonstrating their vulnerability to privacy breaches. On a publicly available stroke prediction dataset, TimberStrike consistently reconstructs between 73.05% and 95.63% of the target dataset across all implementations. We further analyze Differential Privacy, showing that while it partially mitigates the attack, it also significantly degrades model performance. Our findings highlight the need for privacy-preserving mechanisms specifically designed for tree-based Federated Learning systems, and we provide preliminary insights into their design.
Related papers
- Training Set Reconstruction from Differentially Private Forests: How Effective is DP? [7.618051223061514]
We introduce a reconstruction attack targeting state-of-the-art $varepsilon$-DP random forests.<n>Our results reveal that random forests trained with meaningful DP guarantees can still leak portions of their training data.
arXiv Detail & Related papers (2025-02-07T20:22:19Z) - FEDLAD: Federated Evaluation of Deep Leakage Attacks and Defenses [50.921333548391345]
Federated Learning is a privacy preserving decentralized machine learning paradigm.<n>Recent research has revealed that private ground truth data can be recovered through a gradient technique known as Deep Leakage.<n>This paper introduces the FEDLAD Framework (Federated Evaluation of Deep Leakage Attacks and Defenses), a comprehensive benchmark for evaluating Deep Leakage attacks and defenses.
arXiv Detail & Related papers (2024-11-05T11:42:26Z) - QBI: Quantile-Based Bias Initialization for Efficient Private Data Reconstruction in Federated Learning [0.5497663232622965]
Federated learning enables the training of machine learning models on distributed data without compromising user privacy.
Recent research has shown that the central entity can perfectly reconstruct private data from shared model updates.
arXiv Detail & Related papers (2024-06-26T20:19:32Z) - FewFedPIT: Towards Privacy-preserving and Few-shot Federated Instruction Tuning [54.26614091429253]
Federated instruction tuning (FedIT) is a promising solution, by consolidating collaborative training across multiple data owners.
FedIT encounters limitations such as scarcity of instructional data and risk of exposure to training data extraction attacks.
We propose FewFedPIT, designed to simultaneously enhance privacy protection and model performance of federated few-shot learning.
arXiv Detail & Related papers (2024-03-10T08:41:22Z) - Initialization Matters: Privacy-Utility Analysis of Overparameterized
Neural Networks [72.51255282371805]
We prove a privacy bound for the KL divergence between model distributions on worst-case neighboring datasets.
We find that this KL privacy bound is largely determined by the expected squared gradient norm relative to model parameters during training.
arXiv Detail & Related papers (2023-10-31T16:13:22Z) - Turning Privacy-preserving Mechanisms against Federated Learning [22.88443008209519]
We design an attack capable of deceiving state-of-the-art defenses for federated learning.
The proposed attack includes two operating modes, the first one focusing on convergence inhibition (Adversarial Mode) and the second one aiming at building a deceptive rating injection on the global federated model (Backdoor Mode)
The experimental results show the effectiveness of our attack in both its modes, returning on average 60% performance detriment in all the tests on Adversarial Mode and fully effective backdoors in 93% of cases for the tests performed on Backdoor Mode.
arXiv Detail & Related papers (2023-05-09T11:43:31Z) - CrowdGuard: Federated Backdoor Detection in Federated Learning [39.58317527488534]
This paper presents a novel defense mechanism, CrowdGuard, that effectively mitigates backdoor attacks in Federated Learning.
CrowdGuard employs a server-located stacked clustering scheme to enhance its resilience to rogue client feedback.
The evaluation results demonstrate that CrowdGuard achieves a 100% True-Positive-Rate and True-Negative-Rate across various scenarios.
arXiv Detail & Related papers (2022-10-14T11:27:49Z) - Understanding Clipping for Federated Learning: Convergence and
Client-Level Differential Privacy [67.4471689755097]
This paper empirically demonstrates that the clipped FedAvg can perform surprisingly well even with substantial data heterogeneity.
We provide the convergence analysis of a differential private (DP) FedAvg algorithm and highlight the relationship between clipping bias and the distribution of the clients' updates.
arXiv Detail & Related papers (2021-06-25T14:47:19Z) - Federated Learning with Unreliable Clients: Performance Analysis and
Mechanism Design [76.29738151117583]
Federated Learning (FL) has become a promising tool for training effective machine learning models among distributed clients.
However, low quality models could be uploaded to the aggregator server by unreliable clients, leading to a degradation or even a collapse of training.
We model these unreliable behaviors of clients and propose a defensive mechanism to mitigate such a security risk.
arXiv Detail & Related papers (2021-05-10T08:02:27Z) - SoK: Privacy-Preserving Collaborative Tree-based Model Learning [5.759774832460351]
We survey the literature on distributed and privacy-preserving training of tree-based models.
We systematize its knowledge based on four axes: the learning algorithm, the collaborative model, the protection mechanism, and the threat model.
We provide for the first time a framework analyzing the information leakage occurring in distributed tree-based model learning.
arXiv Detail & Related papers (2021-03-16T11:24:15Z) - Robustness Threats of Differential Privacy [70.818129585404]
We experimentally demonstrate that networks, trained with differential privacy, in some settings might be even more vulnerable in comparison to non-private versions.
We study how the main ingredients of differentially private neural networks training, such as gradient clipping and noise addition, affect the robustness of the model.
arXiv Detail & Related papers (2020-12-14T18:59:24Z) - Privacy Preserving Vertical Federated Learning for Tree-based Models [30.808567035503994]
Federated learning enables multiple organizations to jointly train a model without revealing their private data to each other.
We propose Pivot, a novel solution for privacy preserving vertical decision tree training and prediction.
arXiv Detail & Related papers (2020-08-14T02:32:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.