Noise-Calibrated Inference from Differentially Private Sufficient Statistics in Exponential Families
- URL: http://arxiv.org/abs/2603.02010v1
- Date: Mon, 02 Mar 2026 15:55:54 GMT
- Title: Noise-Calibrated Inference from Differentially Private Sufficient Statistics in Exponential Families
- Authors: Amir Asiaee, Samhita Pal,
- Abstract summary: Many differentially private (DP) data release systems either output DP synthetic data or leave analysts to perform inference as usual.<n>This paper develops a clean and tractable middle ground for exponential families: release only DP sufficient statistics, then perform noise-calibrated likelihood-based inference and optional parametric synthetic data generation as post-processing.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many differentially private (DP) data release systems either output DP synthetic data and leave analysts to perform inference as usual, which can lead to severe miscalibration, or output a DP point estimate without a principled way to do uncertainty quantification. This paper develops a clean and tractable middle ground for exponential families: release only DP sufficient statistics, then perform noise-calibrated likelihood-based inference and optional parametric synthetic data generation as post-processing. Our contributions are: (1) a general recipe for approximate-DP release of clipped sufficient statistics under the Gaussian mechanism; (2) asymptotic normality, explicit variance inflation, and valid Wald-style confidence intervals for the plug-in DP MLE; (3) a noise-aware likelihood correction that is first-order equivalent to the plug-in but supports bootstrap-based intervals; and (4) a matching minimax lower bound showing the privacy distortion rate is unavoidable. The resulting theory yields concrete design rules and a practical pipeline for releasing DP synthetic data with principled uncertainty quantification, validated on three exponential families and real census data.
Related papers
- Differentially Private Truncation of Unbounded Data via Public Second Moments [4.662174186673445]
We propose Public-moment-guided Truncation (PMT), which transforms private data using the public second-moment matrix.<n>PMT substantially improves the accuracy and stability of differential privacy models.
arXiv Detail & Related papers (2026-02-25T12:21:30Z) - Information-Theoretic Discrete Diffusion [8.018632880023336]
We present an information-theoretic framework for discrete diffusion models that yields principled estimators of log-likelihood using score-matching losses.<n>Results provide a time-integral decomposition of the log-likelihood of the data in terms of optimal score-based losses.<n>Experiments on synthetic and real-world data confirm the accuracy, variance stability, and utility of our estimators.
arXiv Detail & Related papers (2025-10-28T05:59:05Z) - Spectral Graph Clustering under Differential Privacy: Balancing Privacy, Accuracy, and Efficiency [53.98433419539793]
We study the problem of spectral graph clustering under edge differential privacy (DP)<n>Specifically, we develop three mechanisms: (i) graph perturbation via randomized edge flipping combined with adjacency matrix shuffling, which enforces edge privacy; (ii) private graph projection with additive Gaussian noise in a lower-dimensional space to reduce dimensionality and computational complexity; and (iii) a noisy power iteration method that distributes Gaussian noise across iterations to ensure edge DP while maintaining convergence.
arXiv Detail & Related papers (2025-10-08T15:30:27Z) - Private Statistical Estimation via Truncation [5.642973820558159]
We introduce a novel framework for differentially private statistical estimation via data truncation, addressing a key challenge in DP estimation when the data support is unbounded.<n>By leveraging techniques from truncated statistics, we develop computationally efficient DP estimators for exponential family distributions.
arXiv Detail & Related papers (2025-05-18T20:38:38Z) - Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation [62.2436697657307]
Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data.<n>We propose a method called Stratified Prediction-Powered Inference (StratPPI)<n>We show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies.
arXiv Detail & Related papers (2024-06-06T17:37:39Z) - Noise Variance Optimization in Differential Privacy: A Game-Theoretic Approach Through Per-Instance Differential Privacy [7.264378254137811]
Differential privacy (DP) can measure privacy loss by observing the changes in the distribution caused by the inclusion of individuals in the target dataset.
DP has been prominent in safeguarding datasets in machine learning in industry giants like Apple and Google.
We propose per-instance DP (pDP) as a constraint, measuring privacy loss for each data instance and optimizing noise tailored to individual instances.
arXiv Detail & Related papers (2024-04-24T06:51:16Z) - On the Privacy of Selection Mechanisms with Gaussian Noise [44.577599546904736]
We revisit the analysis of Report Noisy Max and Above Threshold with Gaussian noise.
We find that it is possible to provide pure ex-ante DP bounds for Report Noisy Max and pure ex-post DP bounds for Above Threshold.
arXiv Detail & Related papers (2024-02-09T02:11:25Z) - Optimizing the Noise in Self-Supervised Learning: from Importance
Sampling to Noise-Contrastive Estimation [80.07065346699005]
It is widely assumed that the optimal noise distribution should be made equal to the data distribution, as in Generative Adversarial Networks (GANs)
We turn to Noise-Contrastive Estimation which grounds this self-supervised task as an estimation problem of an energy-based model of the data.
We soberly conclude that the optimal noise may be hard to sample from, and the gain in efficiency can be modest compared to choosing the noise distribution equal to the data's.
arXiv Detail & Related papers (2023-01-23T19:57:58Z) - Noise-Aware Statistical Inference with Differentially Private Synthetic
Data [0.0]
We show that simply analysing DP synthetic data as if it were real does not produce valid inferences of population-level quantities.
We tackle this problem by combining synthetic data analysis techniques from the field of multiple imputation, and synthetic data generation.
We develop a novel noise-aware synthetic data generation algorithm NAPSU-MQ using the principle of maximum entropy.
arXiv Detail & Related papers (2022-05-28T16:59:46Z) - Smoothed Differential Privacy [55.415581832037084]
Differential privacy (DP) is a widely-accepted and widely-applied notion of privacy based on worst-case analysis.
In this paper, we propose a natural extension of DP following the worst average-case idea behind the celebrated smoothed analysis.
We prove that any discrete mechanism with sampling procedures is more private than what DP predicts, while many continuous mechanisms with sampling procedures are still non-private under smoothed DP.
arXiv Detail & Related papers (2021-07-04T06:55:45Z) - On the Practicality of Differential Privacy in Federated Learning by
Tuning Iteration Times [51.61278695776151]
Federated Learning (FL) is well known for its privacy protection when training machine learning models among distributed clients collaboratively.
Recent studies have pointed out that the naive FL is susceptible to gradient leakage attacks.
Differential Privacy (DP) emerges as a promising countermeasure to defend against gradient leakage attacks.
arXiv Detail & Related papers (2021-01-11T19:43:12Z) - Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users.
An adversary may still be able to infer the private training data by attacking the released model.
Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.