Related papers: Managing Correlations in Data and Privacy Demand

Managing Correlations in Data and Privacy Demand

URL: http://arxiv.org/abs/2509.02856v1
Date: Tue, 02 Sep 2025 22:03:13 GMT
Title: Managing Correlations in Data and Privacy Demand
Authors: Syomantak Chaudhuri, Thomas A. Courtade,
Abstract summary: We show that the standard HDP framework falls short when user data and privacy demands are allowed to be correlated.<n>We propose an alternate framework, Add-remove Heterogeneous Differential Privacy (AHDP), that jointly accounts for user data and privacy preference.
Score: 4.855689194518906
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Previous works in the differential privacy literature that allow users to choose their privacy levels typically operate under the heterogeneous differential privacy (HDP) framework with the simplifying assumption that user data and privacy levels are not correlated. Firstly, we demonstrate that the standard HDP framework falls short when user data and privacy demands are allowed to be correlated. Secondly, to address this shortcoming, we propose an alternate framework, Add-remove Heterogeneous Differential Privacy (AHDP), that jointly accounts for user data and privacy preference. We show that AHDP is robust to possible correlations between data and privacy. Thirdly, we formalize the guarantees of the proposed AHDP framework through an operational hypothesis testing perspective. The hypothesis testing setup may be of independent interest in analyzing other privacy frameworks as well. Fourthly, we show that there exists non-trivial AHDP mechanisms that notably do not require prior knowledge of the data-privacy correlations. We propose some such mechanisms and apply them to core statistical tasks such as mean estimation, frequency estimation, and linear regression. The proposed mechanisms are simple to implement with minimal assumptions and modeling requirements, making them attractive for real-world use. Finally, we empirically evaluate proposed AHDP mechanisms, highlighting their trade-offs using LLM-generated synthetic datasets, which we release for future research.

Related papers

Parallel Composition for Statistical Privacy [0.0]
A privacy mechanism is proposed that is based on subsampling and randomly partitioning the database to bound the dependency among queries.<n>These bounds show that in realistic application scenarios taking the entropy of distributions into account yields improvements of privacy and precision guarantees.
arXiv Detail & Related papers (2026-02-10T10:13:44Z)
Rethinking Anonymity Claims in Synthetic Data Generation: A Model-Centric Privacy Attack Perspective [18.404146545866812]
Training generative machine learning models to produce synthetic data has become a popular approach for enhancing privacy in data sharing.<n>As this typically involves processing sensitive personal information, releasing either the trained model or generated synthetic anonymity can still pose privacy risks.<n>We argue that meaningful assessments must account for the capabilities and properties of underlying generative model and be grounded in state-of-the-art privacy attacks.
arXiv Detail & Related papers (2026-01-30T00:57:41Z)
How to DP-fy Your Data: A Practical Guide to Generating Synthetic Data With Differential Privacy [52.00934156883483]
Differential Privacy (DP) is a framework for reasoning about and limiting information leakage.<n>Differentially Private Synthetic data refers to synthetic data that preserves the overall trends of source data.
arXiv Detail & Related papers (2025-12-02T21:14:39Z)
On the MIA Vulnerability Gap Between Private GANs and Diffusion Models [51.53790101362898]
Generative Adversarial Networks (GANs) and diffusion models have emerged as leading approaches for high-quality image synthesis.<n>We present the first unified theoretical and empirical analysis of the privacy risks faced by differentially private generative models.
arXiv Detail & Related papers (2025-09-03T14:18:22Z)
Privacy Amplification for the Gaussian Mechanism via Bounded Support [64.86780616066575]
Data-dependent privacy accounting frameworks such as per-instance differential privacy (pDP) and Fisher information loss (FIL) confer fine-grained privacy guarantees for individuals in a fixed training dataset. We propose simple modifications of the Gaussian mechanism with bounded support, showing that they amplify privacy guarantees under data-dependent accounting.
arXiv Detail & Related papers (2024-03-07T21:22:07Z)
Privacy Constrained Fairness Estimation for Decision Trees [2.9906966931843093]
Measuring the fairness of any AI model requires the sensitive attributes of the individuals in the dataset. We propose a novel method, dubbed Privacy-Aware Fairness Estimation of Rules (PAFER) We show that using the Laplacian mechanism, the method is able to estimate SP with low error while guaranteeing the privacy of the individuals in the dataset with high certainty.
arXiv Detail & Related papers (2023-12-13T14:54:48Z)
Simulation-based Bayesian Inference from Privacy Protected Data [0.0]
We propose simulation-based inference methods from privacy-protected datasets.<n>We illustrate our methods on discrete time-series data under an infectious disease model and with ordinary linear regression models.
arXiv Detail & Related papers (2023-10-19T14:34:17Z)
Differentially Private Linear Regression with Linked Data [3.9325957466009203]
Differential privacy, a mathematical notion from computer science, is a rising tool offering robust privacy guarantees. Recent work focuses on developing differentially private versions of individual statistical and machine learning tasks. We present two differentially private algorithms for linear regression with linked data.
arXiv Detail & Related papers (2023-08-01T21:00:19Z)
A Randomized Approach for Tight Privacy Accounting [63.67296945525791]
We propose a new differential privacy paradigm called estimate-verify-release (EVR) EVR paradigm first estimates the privacy parameter of a mechanism, then verifies whether it meets this guarantee, and finally releases the query output. Our empirical evaluation shows the newly proposed EVR paradigm improves the utility-privacy tradeoff for privacy-preserving machine learning.
arXiv Detail & Related papers (2023-04-17T00:38:01Z)
Breaking the Communication-Privacy-Accuracy Tradeoff with $f$-Differential Privacy [51.11280118806893]
We consider a federated data analytics problem in which a server coordinates the collaborative data analysis of multiple users with privacy concerns and limited communication capability. We study the local differential privacy guarantees of discrete-valued mechanisms with finite output space through the lens of $f$-differential privacy (DP) More specifically, we advance the existing literature by deriving tight $f$-DP guarantees for a variety of discrete-valued mechanisms.
arXiv Detail & Related papers (2023-02-19T16:58:53Z)
DP2-Pub: Differentially Private High-Dimensional Data Publication with Invariant Post Randomization [58.155151571362914]
We propose a differentially private high-dimensional data publication mechanism (DP2-Pub) that runs in two phases. splitting attributes into several low-dimensional clusters with high intra-cluster cohesion and low inter-cluster coupling helps obtain a reasonable privacy budget. We also extend our DP2-Pub mechanism to the scenario with a semi-honest server which satisfies local differential privacy.
arXiv Detail & Related papers (2022-08-24T17:52:43Z)
Causally Constrained Data Synthesis for Private Data Release [36.80484740314504]
Using synthetic data which reflects certain statistical properties of the original data preserves the privacy of the original data. Prior works utilize differentially private data release mechanisms to provide formal privacy guarantees. We propose incorporating causal information into the training process to favorably modify the aforementioned trade-off.
arXiv Detail & Related papers (2021-05-27T13:46:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.