DP-Fast MH: Private, Fast, and Accurate Metropolis-Hastings for
Large-Scale Bayesian Inference
- URL: http://arxiv.org/abs/2303.06171v4
- Date: Thu, 12 Oct 2023 22:55:43 GMT
- Title: DP-Fast MH: Private, Fast, and Accurate Metropolis-Hastings for
Large-Scale Bayesian Inference
- Authors: Wanrong Zhang, Ruqi Zhang
- Abstract summary: We study the Metropolis-Hastings (MH) algorithm for large-scale Bayesian inference under differential privacy.
We reveal, for the first time, a three-way trade-off among privacy, scalability, and efficiency.
We empirically demonstrate the effectiveness and efficiency of our algorithm in various experiments.
- Score: 16.280801141284872
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Bayesian inference provides a principled framework for learning from complex
data and reasoning under uncertainty. It has been widely applied in machine
learning tasks such as medical diagnosis, drug design, and policymaking. In
these common applications, data can be highly sensitive. Differential privacy
(DP) offers data analysis tools with powerful worst-case privacy guarantees and
has been developed as the leading approach in privacy-preserving data analysis.
In this paper, we study Metropolis-Hastings (MH), one of the most fundamental
MCMC methods, for large-scale Bayesian inference under differential privacy.
While most existing private MCMC algorithms sacrifice accuracy and efficiency
to obtain privacy, we provide the first exact and fast DP MH algorithm, using
only a minibatch of data in most iterations. We further reveal, for the first
time, a three-way trade-off among privacy, scalability (i.e. the batch size),
and efficiency (i.e. the convergence rate), theoretically characterizing how
privacy affects the utility and computational cost in Bayesian inference. We
empirically demonstrate the effectiveness and efficiency of our algorithm in
various experiments.
Related papers
- DP-CDA: An Algorithm for Enhanced Privacy Preservation in Dataset Synthesis Through Randomized Mixing [0.8739101659113155]
We introduce an effective data publishing algorithm emphDP-CDA.
Our proposed algorithm generates synthetic datasets by randomly mixing data in a class-specific manner, and inducing carefully-tuned randomness to ensure privacy guarantees.
Our results indicate that synthetic datasets produced using the DP-CDA can achieve superior utility compared to those generated by traditional data publishing algorithms, even when subject to the same privacy requirements.
arXiv Detail & Related papers (2024-11-25T06:14:06Z) - Masked Differential Privacy [64.32494202656801]
We propose an effective approach called masked differential privacy (DP), which allows for controlling sensitive regions where differential privacy is applied.
Our method operates selectively on data and allows for defining non-sensitive-temporal regions without DP application or combining differential privacy with other privacy techniques within data samples.
arXiv Detail & Related papers (2024-10-22T15:22:53Z) - TernaryVote: Differentially Private, Communication Efficient, and
Byzantine Resilient Distributed Optimization on Heterogeneous Data [50.797729676285876]
We propose TernaryVote, which combines a ternary compressor and the majority vote mechanism to realize differential privacy, gradient compression, and Byzantine resilience simultaneously.
We theoretically quantify the privacy guarantee through the lens of the emerging f-differential privacy (DP) and the Byzantine resilience of the proposed algorithm.
arXiv Detail & Related papers (2024-02-16T16:41:14Z) - Theoretically Principled Federated Learning for Balancing Privacy and
Utility [61.03993520243198]
We propose a general learning framework for the protection mechanisms that protects privacy via distorting model parameters.
It can achieve personalized utility-privacy trade-off for each model parameter, on each client, at each communication round in federated learning.
arXiv Detail & Related papers (2023-05-24T13:44:02Z) - On Differential Privacy and Adaptive Data Analysis with Bounded Space [76.10334958368618]
We study the space complexity of the two related fields of differential privacy and adaptive data analysis.
We show that there exists a problem P that requires exponentially more space to be solved efficiently with differential privacy.
The line of work on adaptive data analysis focuses on understanding the number of samples needed for answering a sequence of adaptive queries.
arXiv Detail & Related papers (2023-02-11T14:45:31Z) - Provable Membership Inference Privacy [31.08016816475564]
Differential privacy (DP) has emerged as one canonical standard for provable privacy.
We propose a novel privacy notion, membership inference privacy (MIP), to address these challenges.
We show MIP can be achieved using less amount of randomness compared to the amount required for guaranteeing DP, leading to a smaller drop in utility.
arXiv Detail & Related papers (2022-11-12T06:13:00Z) - On the Statistical Complexity of Estimation and Testing under Privacy Constraints [17.04261371990489]
We show how to characterize the power of a statistical test under differential privacy in a plug-and-play fashion.
We show that maintaining privacy results in a noticeable reduction in performance only when the level of privacy protection is very high.
Finally, we demonstrate that the DP-SGLD algorithm, a private convex solver, can be employed for maximum likelihood estimation with a high degree of confidence.
arXiv Detail & Related papers (2022-10-05T12:55:53Z) - Algorithms with More Granular Differential Privacy Guarantees [65.3684804101664]
We consider partial differential privacy (DP), which allows quantifying the privacy guarantee on a per-attribute basis.
In this work, we study several basic data analysis and learning tasks, and design algorithms whose per-attribute privacy parameter is smaller that the best possible privacy parameter for the entire record of a person.
arXiv Detail & Related papers (2022-09-08T22:43:50Z) - Production of Categorical Data Verifying Differential Privacy:
Conception and Applications to Machine Learning [0.0]
Differential privacy is a formal definition that allows quantifying the privacy-utility trade-off.
With the local DP (LDP) model, users can sanitize their data locally before transmitting it to the server.
In all cases, we concluded that differentially private ML models achieve nearly the same utility metrics as non-private ones.
arXiv Detail & Related papers (2022-04-02T12:50:14Z) - Robust and Differentially Private Mean Estimation [40.323756738056616]
Differential privacy has emerged as a standard requirement in a variety of applications ranging from the U.S. Census to data collected in commercial devices.
An increasing number of such databases consist of data from multiple sources, not all of which can be trusted.
This leaves existing private analyses vulnerable to attacks by an adversary who injects corrupted data.
arXiv Detail & Related papers (2021-02-18T05:02:49Z) - A One-Pass Private Sketch for Most Machine Learning Tasks [48.17461258268463]
Differential privacy (DP) is a compelling privacy definition that explains the privacy-utility tradeoff via formal, provable guarantees.
We propose a private sketch that supports a multitude of machine learning tasks including regression, classification, density estimation, and more.
Our sketch consists of randomized contingency tables that are indexed with locality-sensitive hashing and constructed with an efficient one-pass algorithm.
arXiv Detail & Related papers (2020-06-16T17:47:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.