PRISM: Differentially Private Synthetic Data with Structure-Aware Budget Allocation for Prediction
- URL: http://arxiv.org/abs/2602.10228v1
- Date: Tue, 10 Feb 2026 19:17:55 GMT
- Title: PRISM: Differentially Private Synthetic Data with Structure-Aware Budget Allocation for Prediction
- Authors: Amir Asiaee, Chao Yan, Zachary B. Abrams, Bradley A. Malin,
- Abstract summary: Differential privacy (DP) provides a mathematical guarantee limiting what an adversary can learn about any individual from released data.<n>Existing DP synthetic data methods treat all features symmetrically, spreading noise uniformly even when the data will serve a specific prediction task.<n>We develop a prediction-centric approach operating in three regimes depending on available structural knowledge.
- Score: 9.016539021471845
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Differential privacy (DP) provides a mathematical guarantee limiting what an adversary can learn about any individual from released data. However, achieving this protection typically requires adding noise, and noise can accumulate when many statistics are measured. Existing DP synthetic data methods treat all features symmetrically, spreading noise uniformly even when the data will serve a specific prediction task. We develop a prediction-centric approach operating in three regimes depending on available structural knowledge. In the causal regime, when the causal parents of $Y$ are known and distribution shift is expected, we target the parents for robustness. In the graphical regime, when a Bayesian network structure is available and the distribution is stable, the Markov blanket of $Y$ provides a sufficient feature set for optimal prediction. In the predictive regime, when no structural knowledge exists, we select features via differentially private methods without claiming to recover causal or graphical structure. We formalize this as PRISM, a mechanism that (i) identifies a predictive feature subset according to the appropriate regime, (ii) constructs targeted summary statistics, (iii) allocates budget to minimize an upper bound on prediction error, and (iv) synthesizes data via graphical-model inference. We prove end-to-end privacy guarantees and risk bounds. Empirically, task-aware allocation improves prediction accuracy compared to generic synthesizers. Under distribution shift, targeting causal parents achieves AUC $\approx 0.73$ while correlation-based selection collapses to chance ($\approx 0.49$).
Related papers
- Distribution-informed Online Conformal Prediction [53.674678995825666]
We propose Conformal Optimistic Prediction (COP), an online conformal prediction algorithm incorporating underlying data pattern into the update rule.<n>COP produces tighter prediction sets when predictable pattern exists, while retaining valid coverage guarantees even when estimates are inaccurate.<n>We prove that COP can achieve valid coverage and construct shorter prediction intervals than other baselines.
arXiv Detail & Related papers (2025-12-08T17:51:49Z) - Revisiting Multivariate Time Series Forecasting with Missing Values [65.30332997607141]
Missing values are common in real-world time series.<n>Current approaches have developed an imputation-then-prediction framework that uses imputation modules to fill in missing values, followed by forecasting on the imputed data.<n>This framework overlooks a critical issue: there is no ground truth for the missing values, making the imputation process susceptible to errors that can degrade prediction accuracy.<n>We introduce Consistency-Regularized Information Bottleneck (CRIB), a novel framework built on the Information Bottleneck principle.
arXiv Detail & Related papers (2025-09-27T20:57:48Z) - Epistemic Uncertainty in Conformal Scores: A Unified Approach [2.449909275410288]
Conformal prediction methods create prediction bands with distribution-free guarantees but do not explicitly capture uncertainty.<n>We introduce $texttEPICSCORE$, a model-agnostic approach that enhances any conformal score by explicitly integrating uncertainty.<n>$texttEPICSCORE$ adaptively expands predictive intervals in regions with limited data while maintaining compact intervals where data is abundant.
arXiv Detail & Related papers (2025-02-10T19:42:54Z) - Robust Conformal Prediction Using Privileged Information [17.886554223172517]
We develop a method to generate prediction sets with a guaranteed coverage rate that is robust to corruptions in the training data.<n>Our approach builds on conformal prediction, a powerful framework to construct prediction sets that are valid under the i.i.d assumption.
arXiv Detail & Related papers (2024-06-08T08:56:47Z) - Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.<n>We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.<n>Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z) - Privacy Amplification for the Gaussian Mechanism via Bounded Support [64.86780616066575]
Data-dependent privacy accounting frameworks such as per-instance differential privacy (pDP) and Fisher information loss (FIL) confer fine-grained privacy guarantees for individuals in a fixed training dataset.
We propose simple modifications of the Gaussian mechanism with bounded support, showing that they amplify privacy guarantees under data-dependent accounting.
arXiv Detail & Related papers (2024-03-07T21:22:07Z) - Differentially Private Linear Regression with Linked Data [3.9325957466009203]
Differential privacy, a mathematical notion from computer science, is a rising tool offering robust privacy guarantees.
Recent work focuses on developing differentially private versions of individual statistical and machine learning tasks.
We present two differentially private algorithms for linear regression with linked data.
arXiv Detail & Related papers (2023-08-01T21:00:19Z) - Differentially Private Statistical Inference through $\beta$-Divergence
One Posterior Sampling [2.8544822698499255]
We propose a posterior sampling scheme from a generalised posterior targeting the minimisation of the $beta$-divergence between the model and the data generating process.
This provides private estimation that is generally applicable without requiring changes to the underlying model.
We show that $beta$D-Bayes produces more precise inference estimation for the same privacy guarantees.
arXiv Detail & Related papers (2023-07-11T12:00:15Z) - General Gaussian Noise Mechanisms and Their Optimality for Unbiased Mean
Estimation [58.03500081540042]
A classical approach to private mean estimation is to compute the true mean and add unbiased, but possibly correlated, Gaussian noise to it.
We show that for every input dataset, an unbiased mean estimator satisfying concentrated differential privacy introduces approximately at least as much error.
arXiv Detail & Related papers (2023-01-31T18:47:42Z) - Multi-class Classifier based Failure Prediction with Artificial and
Anonymous Training for Data Privacy [0.0]
A neural network based multi-class classifier is developed for failure prediction.
The proposed mechanism completely decouples the data set used for training process from the actual data which is kept private.
Results show high accuracy in failure prediction under different parameter configurations.
arXiv Detail & Related papers (2022-09-06T07:53:33Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Private Prediction Sets [72.75711776601973]
Machine learning systems need reliable uncertainty quantification and protection of individuals' privacy.
We present a framework that treats these two desiderata jointly.
We evaluate the method on large-scale computer vision datasets.
arXiv Detail & Related papers (2021-02-11T18:59:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.