The Normal Distributions Indistinguishability Spectrum and its Application to Privacy-Preserving Machine Learning
- URL: http://arxiv.org/abs/2309.01243v3
- Date: Fri, 21 Jun 2024 16:54:57 GMT
- Title: The Normal Distributions Indistinguishability Spectrum and its Application to Privacy-Preserving Machine Learning
- Authors: Yun Lu, Malik Magdon-Ismail, Yu Wei, Vassilis Zikas,
- Abstract summary: Differential Privacy is the most common method for machine learning (ML) on privacy-sensitive data.
In big data analytics, one often uses randomized sketching/aggregation algorithms to make processing high-dimensional data tractable.
- Score: 7.61316504036937
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Differential Privacy (DP) (and its variants) is the most common method for machine learning (ML) on privacy-sensitive data. In big data analytics, one often uses randomized sketching/aggregation algorithms to make processing high-dimensional data tractable. Intuitively, such ML algorithms should provide some inherent privacy, yet most existing DP mechanisms do not leverage or under-utilize this inherent randomness, resulting in potentially redundant noising. The motivating question of our work is: (How) can we improve the utility of DP mechanisms for randomized ML queries, by leveraging the randomness of the query itself? Towards a (positive) answer, our key contribution is (proving) what we call the NDIS theorem, a theoretical result with several practical implications. In a nutshell, NDIS is a closed-form analytic computation for the (varepsilon,delta)-indistinguishability-spectrum (IS) of two arbitrary normal distributions N1 and N2, i.e., the optimal delta (for any given varepsilon) such that N1 and N2 are (varepsilon,delta)-close according to the DP distance. The importance of the NDIS theorem lies in that (1) it yields efficient estimators for IS, and (2) it allows us to analyze DP-mechanism with normally-distributed outputs, as well as more general mechanisms by leveraging their behavior on large inputs. We apply the NDIS theorem to derive DP mechanisms for queries with normally-distributed outputs--i.e., Gaussian Random Projections (RP)--and for more general queries--i.e., Ordinary Least Squares (OLS). Compared to existing techniques, our new DP mechanisms achieve superior privacy/utility trade-offs by leveraging the randomness of the underlying algorithms. We then apply the NDIS theorem to a data-driven DP notion--in particular relative DP introduced by Lu et al. [S&P 2024]. Our method identifies the range of (varepsilon,delta) for which no additional noising is needed.
Related papers
- How Private are DP-SGD Implementations? [61.19794019914523]
We show that there can be a substantial gap between the privacy analysis when using the two types of batch sampling.
Our result shows that there can be a substantial gap between the privacy analysis when using the two types of batch sampling.
arXiv Detail & Related papers (2024-03-26T13:02:43Z) - Synthesizing Tight Privacy and Accuracy Bounds via Weighted Model Counting [5.552645730505715]
Two core challenges are finding expressive, compact, and efficient encodings of distributions of DP algorithms.
We address the first challenge by developing a method for tight privacy and accuracy bound synthesis.
We develop a framework for leveraging inherent symmetries in DP algorithms.
arXiv Detail & Related papers (2024-02-26T19:29:46Z) - Private Fine-tuning of Large Language Models with Zeroth-order Optimization [51.19403058739522]
Differentially private gradient descent (DP-SGD) allows models to be trained in a privacy-preserving manner.
We introduce DP-ZO, a private fine-tuning framework for large language models by privatizing zeroth order optimization methods.
arXiv Detail & Related papers (2024-01-09T03:53:59Z) - Bounded and Unbiased Composite Differential Privacy [25.427802467876248]
The objective of differential privacy (DP) is to protect privacy by producing an output distribution that is indistinguishable between two neighboring databases.
Existing solutions attempt to address this issue by employing post-processing or truncation techniques.
We propose a novel differentially private mechanism which uses a composite probability density function to generate bounded and unbiased outputs.
arXiv Detail & Related papers (2023-11-04T04:43:47Z) - The Normalized Cross Density Functional: A Framework to Quantify
Statistical Dependence for Random Processes [6.625320950808605]
We present a novel approach to measuring statistical dependence between two random processes (r.p.) using a positive-definite function called the Normalized Cross Density (NCD)
NCD is derived directly from the probability density functions of two r.p. and constructs a data-dependent Hilbert space, the Normalized Cross-Density Hilbert Space (NCD-HS)
We mathematically prove that FMCA learns the dominant eigenvalues and eigenfunctions of NCD directly from realizations.
arXiv Detail & Related papers (2022-12-09T02:12:41Z) - Normalized/Clipped SGD with Perturbation for Differentially Private
Non-Convex Optimization [94.06564567766475]
DP-SGD and DP-NSGD mitigate the risk of large models memorizing sensitive training data.
We show that these two algorithms achieve similar best accuracy while DP-NSGD is comparatively easier to tune than DP-SGD.
arXiv Detail & Related papers (2022-06-27T03:45:02Z) - Shuffle Gaussian Mechanism for Differential Privacy [2.7564955518050693]
We study the mechanism's R'enyi differential privacy (RDP), showing that it is of the form: $$ epsilon(lambda) leq frac1lambda-1logleft(frace-da/2sigma2ndasum_substackk_+dotsc+k_n=lambda;k_nlambda!k_nlambda!k_nlambda!k_nlambda!
arXiv Detail & Related papers (2022-06-20T04:54:16Z) - Dimension Independent Generalization of DP-SGD for Overparameterized
Smooth Convex Optimization [24.644583626705742]
This paper considers the generalization performance of differentially private convex learning.
We demonstrate that the convergence analysis of Langevin algorithms can be used to obtain new generalization bounds with differential privacy guarantees for DP-SGD.
arXiv Detail & Related papers (2022-06-03T22:03:05Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Smoothed Differential Privacy [55.415581832037084]
Differential privacy (DP) is a widely-accepted and widely-applied notion of privacy based on worst-case analysis.
In this paper, we propose a natural extension of DP following the worst average-case idea behind the celebrated smoothed analysis.
We prove that any discrete mechanism with sampling procedures is more private than what DP predicts, while many continuous mechanisms with sampling procedures are still non-private under smoothed DP.
arXiv Detail & Related papers (2021-07-04T06:55:45Z) - On the Practicality of Differential Privacy in Federated Learning by
Tuning Iteration Times [51.61278695776151]
Federated Learning (FL) is well known for its privacy protection when training machine learning models among distributed clients collaboratively.
Recent studies have pointed out that the naive FL is susceptible to gradient leakage attacks.
Differential Privacy (DP) emerges as a promising countermeasure to defend against gradient leakage attacks.
arXiv Detail & Related papers (2021-01-11T19:43:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.