Simulation-based Bayesian Inference from Privacy Protected Data
- URL: http://arxiv.org/abs/2310.12781v4
- Date: Sat, 29 Mar 2025 19:39:41 GMT
- Title: Simulation-based Bayesian Inference from Privacy Protected Data
- Authors: Yifei Xiong, Nianqiao Phyllis Ju, Sanguo Zhang,
- Abstract summary: We propose simulation-based inference methods from privacy-protected datasets.<n>We illustrate our methods on discrete time-series data under an infectious disease model and with ordinary linear regression models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many modern statistical analysis and machine learning applications require training models on sensitive user data. Under a formal definition of privacy protection, differentially private algorithms inject calibrated noise into the confidential data or during the data analysis process to produce privacy-protected datasets or queries. However, restricting access to only privatized data during statistical analysis makes it computationally challenging to make valid statistical inferences. In this work, we propose simulation-based inference methods from privacy-protected datasets. In addition to sequential Monte Carlo approximate Bayesian computation, we adopt neural conditional density estimators as a flexible family of distributions to approximate the posterior distribution of model parameters given the observed private query results. We illustrate our methods on discrete time-series data under an infectious disease model and with ordinary linear regression models. Illustrating the privacy-utility trade-off, our experiments and analysis demonstrate the necessity and feasibility of designing valid statistical inference procedures to correct for biases introduced by the privacy-protection mechanisms.
Related papers
- High-Dimensional Differentially Private Quantile Regression: Distributed Estimation and Statistical Inference [0.26784722398800515]
We propose a differentially private quantile regression method for high-dimensional data in a distributed setting.<n>We develop a differentially private estimation algorithm with iterative updates, ensuring near-optimal statistical accuracy and formal privacy guarantees.
arXiv Detail & Related papers (2025-08-07T09:47:44Z) - A Privacy-Preserving Data Collection Method for Diversified Statistical Analysis [11.135689359531105]
This paper proposes a novel real-value negative survey model, termed RVNS, for the first time in the field of real-value sensitive information collection.<n>The RVNS model exempts users from the necessity of discretizing their data and only requires them to sample a set of data from a range that deviates from their actual sensitive details.
arXiv Detail & Related papers (2025-07-23T04:05:33Z) - On Privately Estimating a Single Parameter [47.499748486548484]
We investigate differentially private estimators for individual parameters within larger parametric models.
By leveraging these private certificates, we provide computationally and statistical efficient mechanisms that release private statistics that are, at least in the sample size, essentially unimprovable.
We investigate the practicality of the algorithms both in simulated data and in real-world data from the American Community Survey and US Census, highlighting scenarios in which the new procedures are successful and identifying areas for future work.
arXiv Detail & Related papers (2025-03-21T15:57:12Z) - Differentially Private Random Feature Model [52.468511541184895]
We produce a differentially private random feature model for privacy-preserving kernel machines.
We show that our method preserves privacy and derive a generalization error bound for the method.
arXiv Detail & Related papers (2024-12-06T05:31:08Z) - Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - Differentially Private Covariate Balancing Causal Inference [8.133739801185271]
Differential privacy is the leading mathematical framework for privacy protection.
Our algorithm produces both point and interval estimators with statistical guarantees, such as consistency and rate optimality, under a given privacy budget.
arXiv Detail & Related papers (2024-10-18T18:02:13Z) - LLM-based Privacy Data Augmentation Guided by Knowledge Distillation
with a Distribution Tutor for Medical Text Classification [67.92145284679623]
We propose a DP-based tutor that models the noised private distribution and controls samples' generation with a low privacy cost.
We theoretically analyze our model's privacy protection and empirically verify our model.
arXiv Detail & Related papers (2024-02-26T11:52:55Z) - Initialization Matters: Privacy-Utility Analysis of Overparameterized
Neural Networks [72.51255282371805]
We prove a privacy bound for the KL divergence between model distributions on worst-case neighboring datasets.
We find that this KL privacy bound is largely determined by the expected squared gradient norm relative to model parameters during training.
arXiv Detail & Related papers (2023-10-31T16:13:22Z) - Exact and Efficient Bayesian Inference for Privacy Risk Quantification (Extended Version) [0.0]
Privug is a method to quantify privacy risks of data analytics programs by analyzing their source code.
The inference engine is implemented for a subset of Python programs.
We evaluate the method by analyzing privacy risks in programs to release public statistics.
arXiv Detail & Related papers (2023-08-31T13:04:04Z) - Differentially Private Linear Regression with Linked Data [3.9325957466009203]
Differential privacy, a mathematical notion from computer science, is a rising tool offering robust privacy guarantees.
Recent work focuses on developing differentially private versions of individual statistical and machine learning tasks.
We present two differentially private algorithms for linear regression with linked data.
arXiv Detail & Related papers (2023-08-01T21:00:19Z) - Epsilon*: Privacy Metric for Machine Learning Models [7.461284823977013]
Epsilon* is a new metric for measuring the privacy risk of a single model instance prior to, during, or after deployment of privacy mitigation strategies.
It requires only black-box access to model predictions, does not require training data re-sampling or model re-training, and can be used to measure the privacy risk of models not trained with differential privacy.
arXiv Detail & Related papers (2023-07-21T00:49:07Z) - Probing the Transition to Dataset-Level Privacy in ML Models Using an
Output-Specific and Data-Resolved Privacy Profile [23.05994842923702]
We study a privacy metric that quantifies the extent to which a model trained on a dataset using a Differential Privacy mechanism is covered" by each of the distributions resulting from training on neighboring datasets.
We show that the privacy profile can be used to probe an observed transition to indistinguishability that takes place in the neighboring distributions as $epsilon$ decreases.
arXiv Detail & Related papers (2023-06-27T20:39:07Z) - Simulation-based, Finite-sample Inference for Privatized Data [14.218697973204065]
We propose a simulation-based "repro sample" approach to produce statistically valid confidence intervals and hypothesis tests.
We show that this methodology is applicable to a wide variety of private inference problems.
arXiv Detail & Related papers (2023-03-09T15:19:31Z) - Membership Inference Attacks against Synthetic Data through Overfitting
Detection [84.02632160692995]
We argue for a realistic MIA setting that assumes the attacker has some knowledge of the underlying data distribution.
We propose DOMIAS, a density-based MIA model that aims to infer membership by targeting local overfitting of the generative model.
arXiv Detail & Related papers (2023-02-24T11:27:39Z) - Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge.
Existing private generative models are struggling with the utility of synthetic samples.
We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z) - Sensitivity analysis in differentially private machine learning using
hybrid automatic differentiation [54.88777449903538]
We introduce a novel textithybrid automatic differentiation (AD) system for sensitivity analysis.
This enables modelling the sensitivity of arbitrary differentiable function compositions, such as the training of neural networks on private data.
Our approach can enable the principled reasoning about privacy loss in the setting of data processing.
arXiv Detail & Related papers (2021-07-09T07:19:23Z) - Private Prediction Sets [72.75711776601973]
Machine learning systems need reliable uncertainty quantification and protection of individuals' privacy.
We present a framework that treats these two desiderata jointly.
We evaluate the method on large-scale computer vision datasets.
arXiv Detail & Related papers (2021-02-11T18:59:11Z) - Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users.
An adversary may still be able to infer the private training data by attacking the released model.
Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.