Related papers: Balance is key: Private median splits yield high-utility random trees

Balance is key: Private median splits yield high-utility random trees

URL: http://arxiv.org/abs/2006.08795v2
Date: Sat, 20 Feb 2021 01:18:15 GMT
Title: Balance is key: Private median splits yield high-utility random trees
Authors: Shorya Consul, Sinead A. Williamson
Abstract summary: We propose DiPriMe forests, a novel tree-based ensemble method for differentially private regression and classification. We show theoretically and empirically that the resulting algorithm exhibits high utility, while ensuring differential privacy.
Score: 4.90655427233754
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Random forests are a popular method for classification and regression due to their versatility. However, this flexibility can come at the cost of user privacy, since training random forests requires multiple data queries, often on small, identifiable subsets of the training data. Privatizing these queries typically comes at a high utility cost, in large part because we are privatizing queries on small subsets of the data, which are easily corrupted by added noise. In this paper, we propose DiPriMe forests, a novel tree-based ensemble method for differentially private regression and classification, which is appropriate for real or categorical covariates. We generate splits using a differentially private version of the median, which encourages balanced leaf nodes. By avoiding low occupancy leaf nodes, we avoid high signal-to-noise ratios when privatizing the leaf node sufficient statistics. We show theoretically and empirically that the resulting algorithm exhibits high utility, while ensuring differential privacy.

Related papers

Optimized Tradeoffs for Private Prediction with Majority Ensembling [59.99331405291337]
We introduce the Data-dependent Randomized Response Majority (DaRRM) algorithm. DaRRM is parameterized by a data-dependent noise function $gamma$, and enables efficient utility optimization over the class of all private algorithms. We show that DaRRM provably enjoys a privacy gain of a factor of 2 over common baselines, with fixed utility.
arXiv Detail & Related papers (2024-11-27T00:48:48Z)
Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner. Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z)
Adaptive Split Balancing for Optimal Random Forest [8.916614661563893]
We propose a new random forest algorithm that constructs the trees using a novel adaptive split-balancing method. Our method achieves optimality in simple, smooth scenarios while adaptively learning the tree structure from the data.
arXiv Detail & Related papers (2024-02-17T09:10:40Z)
Optimal Unbiased Randomizers for Regression with Label Differential Privacy [61.63619647307816]
We propose a new family of label randomizers for training regression models under the constraint of label differential privacy (DP) We demonstrate that these randomizers achieve state-of-the-art privacy-utility trade-offs on several datasets.
arXiv Detail & Related papers (2023-12-09T19:58:34Z)
Differentially-Private Decision Trees and Provable Robustness to Data Poisoning [8.649768969060647]
Decision trees are interpretable models that are well-suited to non-linear learning problems. Current state-of-the-art algorithms for this purpose sacrifice much utility for a small privacy benefit. We propose PrivaTree based on private histograms that chooses good splits while consuming a small privacy budget.
arXiv Detail & Related papers (2023-05-24T17:56:18Z)
Privacy Amplification via Shuffling: Unified, Simplified, and Tightened [20.10078781197001]
We propose a comprehensive framework for privacy amplification in both single-message and multi-message shuffle protocols. Our theoretical results demonstrate that our framework provides tighter bounds, especially for local randomizers with extremal probability design. Our bounds also result in a remarkably efficient $tildeO(n)$ algorithm that numerically amplifies privacy in less than $10$ seconds for $n=108$ users.
arXiv Detail & Related papers (2023-04-11T06:27:25Z)
Smooth Anonymity for Sparse Graphs [69.1048938123063]
differential privacy has emerged as the gold standard of privacy, however, when it comes to sharing sparse datasets. In this work, we consider a variation of $k$-anonymity, which we call smooth-$k$-anonymity, and design simple large-scale algorithms that efficiently provide smooth-$k$-anonymity.
arXiv Detail & Related papers (2022-07-13T17:09:25Z)
Mixed Differential Privacy in Computer Vision [133.68363478737058]
AdaMix is an adaptive differentially private algorithm for training deep neural network classifiers using both private and public image data. A few-shot or even zero-shot learning baseline that ignores private data can outperform fine-tuning on a large private dataset.
arXiv Detail & Related papers (2022-03-22T06:15:43Z)
Learning with User-Level Privacy [61.62978104304273]
We analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints. Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution. We derive an algorithm that privately answers a sequence of $K$ adaptively chosen queries with privacy cost proportional to $tau$, and apply it to solve the learning tasks we consider.
arXiv Detail & Related papers (2021-02-23T18:25:13Z)
Hiding Among the Clones: A Simple and Nearly Optimal Analysis of Privacy Amplification by Shuffling [49.43288037509783]
We show that random shuffling amplifies differential privacy guarantees of locally randomized data. Our result is based on a new approach that is simpler than previous work and extends to approximate differential privacy with nearly the same guarantees.
arXiv Detail & Related papers (2020-12-23T17:07:26Z)
Privacy Amplification via Random Check-Ins [38.72327434015975]
Differentially Private Gradient Descent (DP-SGD) forms a fundamental building block in many applications for learning over sensitive data. In this paper, we focus on conducting iterative methods like DP-SGD in the setting of federated learning (FL) wherein the data is distributed among many devices (clients) Our main contribution is the emphrandom check-in distributed protocol, which crucially relies only on randomized participation decisions made locally and independently by each client.
arXiv Detail & Related papers (2020-07-13T18:14:09Z)
Fr\'echet random forests for metric space valued regression with non euclidean predictors [0.0]
We introduce Fr'echet trees and Fr'echet random forests, which allow to handle data for which input and output variables take values in general metric spaces. A consistency theorem for Fr'echet regressogram predictor using data-driven partitions is given and applied to Fr'echet purely uniformly random trees.
arXiv Detail & Related papers (2019-06-04T22:07:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.