Balance is key: Private median splits yield high-utility random trees
- URL: http://arxiv.org/abs/2006.08795v2
- Date: Sat, 20 Feb 2021 01:18:15 GMT
- Title: Balance is key: Private median splits yield high-utility random trees
- Authors: Shorya Consul, Sinead A. Williamson
- Abstract summary: We propose DiPriMe forests, a novel tree-based ensemble method for differentially private regression and classification.
We show theoretically and empirically that the resulting algorithm exhibits high utility, while ensuring differential privacy.
- Score: 4.90655427233754
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Random forests are a popular method for classification and regression due to
their versatility. However, this flexibility can come at the cost of user
privacy, since training random forests requires multiple data queries, often on
small, identifiable subsets of the training data. Privatizing these queries
typically comes at a high utility cost, in large part because we are
privatizing queries on small subsets of the data, which are easily corrupted by
added noise. In this paper, we propose DiPriMe forests, a novel tree-based
ensemble method for differentially private regression and classification, which
is appropriate for real or categorical covariates. We generate splits using a
differentially private version of the median, which encourages balanced leaf
nodes. By avoiding low occupancy leaf nodes, we avoid high signal-to-noise
ratios when privatizing the leaf node sufficient statistics. We show
theoretically and empirically that the resulting algorithm exhibits high
utility, while ensuring differential privacy.
Related papers
- Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - Adaptive Split Balancing for Optimal Random Forest [8.916614661563893]
We propose a new random forest algorithm that constructs the trees using a novel adaptive split-balancing method.
Our method achieves optimality in simple, smooth scenarios while adaptively learning the tree structure from the data.
arXiv Detail & Related papers (2024-02-17T09:10:40Z) - Optimal Unbiased Randomizers for Regression with Label Differential
Privacy [61.63619647307816]
We propose a new family of label randomizers for training regression models under the constraint of label differential privacy (DP)
We demonstrate that these randomizers achieve state-of-the-art privacy-utility trade-offs on several datasets.
arXiv Detail & Related papers (2023-12-09T19:58:34Z) - Differentially-Private Decision Trees and Provable Robustness to Data
Poisoning [8.649768969060647]
Decision trees are interpretable models that are well-suited to non-linear learning problems.
Current state-of-the-art algorithms for this purpose sacrifice much utility for a small privacy benefit.
We propose PrivaTree based on private histograms that chooses good splits while consuming a small privacy budget.
arXiv Detail & Related papers (2023-05-24T17:56:18Z) - Privacy Amplification via Shuffling: Unified, Simplified, and Tightened [20.10078781197001]
We propose a comprehensive framework for privacy amplification in both single-message and multi-message shuffle protocols.
Our theoretical results demonstrate that our framework provides tighter bounds, especially for local randomizers with extremal probability design.
Our bounds also result in a remarkably efficient $tildeO(n)$ algorithm that numerically amplifies privacy in less than $10$ seconds for $n=108$ users.
arXiv Detail & Related papers (2023-04-11T06:27:25Z) - Smooth Anonymity for Sparse Graphs [69.1048938123063]
differential privacy has emerged as the gold standard of privacy, however, when it comes to sharing sparse datasets.
In this work, we consider a variation of $k$-anonymity, which we call smooth-$k$-anonymity, and design simple large-scale algorithms that efficiently provide smooth-$k$-anonymity.
arXiv Detail & Related papers (2022-07-13T17:09:25Z) - Mixed Differential Privacy in Computer Vision [133.68363478737058]
AdaMix is an adaptive differentially private algorithm for training deep neural network classifiers using both private and public image data.
A few-shot or even zero-shot learning baseline that ignores private data can outperform fine-tuning on a large private dataset.
arXiv Detail & Related papers (2022-03-22T06:15:43Z) - Learning with User-Level Privacy [61.62978104304273]
We analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints.
Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution.
We derive an algorithm that privately answers a sequence of $K$ adaptively chosen queries with privacy cost proportional to $tau$, and apply it to solve the learning tasks we consider.
arXiv Detail & Related papers (2021-02-23T18:25:13Z) - Hiding Among the Clones: A Simple and Nearly Optimal Analysis of Privacy
Amplification by Shuffling [49.43288037509783]
We show that random shuffling amplifies differential privacy guarantees of locally randomized data.
Our result is based on a new approach that is simpler than previous work and extends to approximate differential privacy with nearly the same guarantees.
arXiv Detail & Related papers (2020-12-23T17:07:26Z) - Privacy Amplification via Random Check-Ins [38.72327434015975]
Differentially Private Gradient Descent (DP-SGD) forms a fundamental building block in many applications for learning over sensitive data.
In this paper, we focus on conducting iterative methods like DP-SGD in the setting of federated learning (FL) wherein the data is distributed among many devices (clients)
Our main contribution is the emphrandom check-in distributed protocol, which crucially relies only on randomized participation decisions made locally and independently by each client.
arXiv Detail & Related papers (2020-07-13T18:14:09Z) - Fr\'echet random forests for metric space valued regression with non
euclidean predictors [0.0]
We introduce Fr'echet trees and Fr'echet random forests, which allow to handle data for which input and output variables take values in general metric spaces.
A consistency theorem for Fr'echet regressogram predictor using data-driven partitions is given and applied to Fr'echet purely uniformly random trees.
arXiv Detail & Related papers (2019-06-04T22:07:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.