A One-Pass Private Sketch for Most Machine Learning Tasks
- URL: http://arxiv.org/abs/2006.09352v1
- Date: Tue, 16 Jun 2020 17:47:48 GMT
- Title: A One-Pass Private Sketch for Most Machine Learning Tasks
- Authors: Benjamin Coleman and Anshumali Shrivastava
- Abstract summary: Differential privacy (DP) is a compelling privacy definition that explains the privacy-utility tradeoff via formal, provable guarantees.
We propose a private sketch that supports a multitude of machine learning tasks including regression, classification, density estimation, and more.
Our sketch consists of randomized contingency tables that are indexed with locality-sensitive hashing and constructed with an efficient one-pass algorithm.
- Score: 48.17461258268463
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Differential privacy (DP) is a compelling privacy definition that explains
the privacy-utility tradeoff via formal, provable guarantees. Inspired by
recent progress toward general-purpose data release algorithms, we propose a
private sketch, or small summary of the dataset, that supports a multitude of
machine learning tasks including regression, classification, density
estimation, near-neighbor search, and more. Our sketch consists of randomized
contingency tables that are indexed with locality-sensitive hashing and
constructed with an efficient one-pass algorithm. We prove competitive error
bounds for DP kernel density estimation. Existing methods for DP kernel density
estimation scale poorly, often exponentially slower with an increase in
dimensions. In contrast, our sketch can quickly run on large, high-dimensional
datasets in a single pass. Exhaustive experiments show that our generic sketch
delivers a similar privacy-utility tradeoff when compared to existing DP
methods at a fraction of the computation cost. We expect that our sketch will
enable differential privacy in distributed, large-scale machine learning
settings.
Related papers
- Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - Private Language Models via Truncated Laplacian Mechanism [18.77713904999236]
We propose a novel private embedding method called the high dimensional truncated Laplacian mechanism.
We show that our method has a lower variance compared to the previous private word embedding methods.
Remarkably, even in the high privacy regime, our approach only incurs a slight decrease in utility compared to the non-private scenario.
arXiv Detail & Related papers (2024-10-10T15:25:02Z) - Enhanced Privacy Bound for Shuffle Model with Personalized Privacy [32.08637708405314]
Differential Privacy (DP) is an enhanced privacy protocol which introduces an intermediate trusted server between local users and a central data curator.
It significantly amplifies the central DP guarantee by anonymizing and shuffling the local randomized data.
This work focuses on deriving the central privacy bound for a more practical setting where personalized local privacy is required by each user.
arXiv Detail & Related papers (2024-07-25T16:11:56Z) - Noise Variance Optimization in Differential Privacy: A Game-Theoretic Approach Through Per-Instance Differential Privacy [7.264378254137811]
Differential privacy (DP) can measure privacy loss by observing the changes in the distribution caused by the inclusion of individuals in the target dataset.
DP has been prominent in safeguarding datasets in machine learning in industry giants like Apple and Google.
We propose per-instance DP (pDP) as a constraint, measuring privacy loss for each data instance and optimizing noise tailored to individual instances.
arXiv Detail & Related papers (2024-04-24T06:51:16Z) - Closed-Form Bounds for DP-SGD against Record-level Inference [18.85865832127335]
We focus on the popular DP-SGD algorithm, and derive simple closed-form bounds.
We obtain bounds for membership inference that match state-of-the-art techniques.
We present a novel data-dependent bound against attribute inference.
arXiv Detail & Related papers (2024-02-22T09:26:16Z) - Initialization Matters: Privacy-Utility Analysis of Overparameterized
Neural Networks [72.51255282371805]
We prove a privacy bound for the KL divergence between model distributions on worst-case neighboring datasets.
We find that this KL privacy bound is largely determined by the expected squared gradient norm relative to model parameters during training.
arXiv Detail & Related papers (2023-10-31T16:13:22Z) - Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD.
We find that most examples enjoy stronger privacy guarantees than the worst-case bound.
This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z) - Differentially Private Federated Learning via Inexact ADMM with Multiple
Local Updates [0.0]
We develop a DP inexact alternating direction method of multipliers algorithm with multiple local updates for federated learning.
We show that our algorithm provides $barepsilon$-DP for every iteration, where $barepsilon$ is a privacy budget controlled by the user.
We demonstrate that our algorithm reduces the testing error by at most $31%$ compared with the existing DP algorithm, while achieving the same level of data privacy.
arXiv Detail & Related papers (2022-02-18T19:58:47Z) - Don't Generate Me: Training Differentially Private Generative Models
with Sinkhorn Divergence [73.14373832423156]
We propose DP-Sinkhorn, a novel optimal transport-based generative method for learning data distributions from private data with differential privacy.
Unlike existing approaches for training differentially private generative models, we do not rely on adversarial objectives.
arXiv Detail & Related papers (2021-11-01T18:10:21Z) - Learning with User-Level Privacy [61.62978104304273]
We analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints.
Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution.
We derive an algorithm that privately answers a sequence of $K$ adaptively chosen queries with privacy cost proportional to $tau$, and apply it to solve the learning tasks we consider.
arXiv Detail & Related papers (2021-02-23T18:25:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.