Training Set Reconstruction from Differentially Private Forests: How Effective is DP?
- URL: http://arxiv.org/abs/2502.05307v1
- Date: Fri, 07 Feb 2025 20:22:19 GMT
- Title: Training Set Reconstruction from Differentially Private Forests: How Effective is DP?
- Authors: Alice Gorgé, Julien Ferry, Sébastien Gambs, Thibaut Vidal,
- Abstract summary: Differential privacy (DP) has become a widely adopted countermeasure, as it offers rigorous privacy protections.
We introduce a reconstruction attack targeting state-of-the-art $varepsilon$-DP random forests.
Our results reveal that random forests trained with meaningful DP guarantees can still leak substantial portions of their training data.
- Score: 7.618051223061514
- License:
- Abstract: Recent research has shown that machine learning models are vulnerable to privacy attacks targeting their training data. Differential privacy (DP) has become a widely adopted countermeasure, as it offers rigorous privacy protections. In this paper, we introduce a reconstruction attack targeting state-of-the-art $\varepsilon$-DP random forests. By leveraging a constraint programming model that incorporates knowledge of the forest's structure and DP mechanism characteristics, our approach formally reconstructs the most likely dataset that could have produced a given forest. Through extensive computational experiments, we examine the interplay between model utility, privacy guarantees, and reconstruction accuracy across various configurations. Our results reveal that random forests trained with meaningful DP guarantees can still leak substantial portions of their training data. Specifically, while DP reduces the success of reconstruction attacks, the only forests fully robust to our attack exhibit predictive performance no better than a constant classifier. Building on these insights, we provide practical recommendations for the construction of DP random forests that are more resilient to reconstruction attacks and maintain non-trivial predictive performance.
Related papers
- DynFrs: An Efficient Framework for Machine Unlearning in Random Forest [2.315324942451179]
DynFrs is a framework designed to enable efficient machine unlearning in Random Forests.
In experiments, applying Dynfrs on Extremely Trees yields substantial improvements.
arXiv Detail & Related papers (2024-10-02T14:20:30Z) - Bayes' capacity as a measure for reconstruction attacks in federated learning [10.466570297146953]
We formalise the reconstruction threat model using the information-theoretic framework of quantitative information flow.
We show that the Bayes' capacity, related to the Sibson mutual information of order infinity, represents a tight upper bound on the leakage of the DP-SGD algorithm to an adversary.
arXiv Detail & Related papers (2024-06-19T13:58:42Z) - Visual Privacy Auditing with Diffusion Models [52.866433097406656]
We propose a reconstruction attack based on diffusion models (DMs) that assumes adversary access to real-world image priors.
We show that (1) real-world data priors significantly influence reconstruction success, (2) current reconstruction bounds do not model the risk posed by data priors well, and (3) DMs can serve as effective auditing tools for visualizing privacy leakage.
arXiv Detail & Related papers (2024-03-12T12:18:55Z) - Pre-training Differentially Private Models with Limited Public Data [54.943023722114134]
differential privacy (DP) is a prominent method to gauge the degree of security provided to the models.
DP is yet not capable of protecting a substantial portion of the data used during the initial pre-training stage.
We develop a novel DP continual pre-training strategy using only 10% of public data.
Our strategy can achieve DP accuracy of 41.5% on ImageNet-21k, as well as non-DP accuracy of 55.7% and and 60.0% on downstream tasks Places365 and iNaturalist-2021.
arXiv Detail & Related papers (2024-02-28T23:26:27Z) - Bounding Reconstruction Attack Success of Adversaries Without Data
Priors [53.41619942066895]
Reconstruction attacks on machine learning (ML) models pose a strong risk of leakage of sensitive data.
In this work, we provide formal upper bounds on reconstruction success under realistic adversarial settings.
arXiv Detail & Related papers (2024-02-20T09:52:30Z) - Reconciling AI Performance and Data Reconstruction Resilience for
Medical Imaging [52.578054703818125]
Artificial Intelligence (AI) models are vulnerable to information leakage of their training data, which can be highly sensitive.
Differential Privacy (DP) aims to circumvent these susceptibilities by setting a quantifiable privacy budget.
We show that using very large privacy budgets can render reconstruction attacks impossible, while drops in performance are negligible.
arXiv Detail & Related papers (2023-12-05T12:21:30Z) - Bounding Training Data Reconstruction in DP-SGD [42.36933026300976]
Differentially private training offers a protection which is usually interpreted as a guarantee against membership inference attacks.
By proxy, this guarantee extends to other threats like reconstruction attacks attempting to extract complete training examples.
Recent works provide evidence that if one does not need to protect against membership attacks but instead only wants to protect against training data reconstruction, then utility of private models can be improved.
arXiv Detail & Related papers (2023-02-14T18:02:34Z) - Defending against Reconstruction Attacks with R\'enyi Differential
Privacy [72.1188520352079]
Reconstruction attacks allow an adversary to regenerate data samples of the training set using access to only a trained model.
Differential privacy is a known solution to such attacks, but is often used with a relatively large privacy budget.
We show that, for a same mechanism, we can derive privacy guarantees for reconstruction attacks that are better than the traditional ones from the literature.
arXiv Detail & Related papers (2022-02-15T18:09:30Z) - Reconstructing Training Data with Informed Adversaries [30.138217209991826]
Given access to a machine learning model, can an adversary reconstruct the model's training data?
This work studies this question from the lens of a powerful informed adversary who knows all the training data points except one.
We show it is feasible to reconstruct the remaining data point in this stringent threat model.
arXiv Detail & Related papers (2022-01-13T09:19:25Z) - DP-InstaHide: Provably Defusing Poisoning and Backdoor Attacks with
Differentially Private Data Augmentations [54.960853673256]
We show that strong data augmentations, such as mixup and random additive noise, nullify poison attacks while enduring only a small accuracy trade-off.
A rigorous analysis of DP-InstaHide shows that mixup does indeed have privacy advantages, and that training with k-way mixup provably yields at least k times stronger DP guarantees than a naive DP mechanism.
arXiv Detail & Related papers (2021-03-02T23:07:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.