Within-Dataset Disclosure Risk for Differential Privacy
- URL: http://arxiv.org/abs/2310.13104v4
- Date: Mon, 03 Mar 2025 03:45:05 GMT
- Title: Within-Dataset Disclosure Risk for Differential Privacy
- Authors: Zhiru Zhu, Raul Castro Fernandez,
- Abstract summary: Differential privacy (DP) enables private data analysis.<n>In a typical DP deployment, controllers manage individuals' sensitive data and are responsible for answering analysts' queries while protecting individuals' privacy.<n>It is challenging for controllers to choose $epsilon$ because of the difficulty of interpreting the privacy implications of such a choice on the within-dataset individuals.
- Score: 5.288762073608111
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Differential privacy (DP) enables private data analysis. In a typical DP deployment, controllers manage individuals' sensitive data and are responsible for answering analysts' queries while protecting individuals' privacy. They do so by choosing the privacy parameter $\epsilon$, which controls the degree of privacy for all individuals in all possible datasets. However, it is challenging for controllers to choose $\epsilon$ because of the difficulty of interpreting the privacy implications of such a choice on the within-dataset individuals. To address this challenge, we first derive a relative disclosure risk indicator (RDR) that indicates the impact of choosing $\epsilon$ on the within-dataset individuals' disclosure risk. We then design an algorithm to find $\epsilon$ based on controllers' privacy preferences expressed as a function of the within-dataset individuals' RDRs, and an alternative algorithm that finds and releases $\epsilon$ while satisfying DP. Lastly, we propose a solution that bounds the total privacy leakage when using the algorithm to answer multiple queries without requiring controllers to set the total privacy budget. We evaluate our contributions through an IRB-approved user study that shows the RDR is useful for helping controllers choose $\epsilon$, and experimental evaluations showing our algorithms are efficient and scalable.
Related papers
- Privacy-Preserving Retrieval-Augmented Generation with Differential Privacy [25.896416088293908]
retrieval-augmented generation (RAG) is particularly effective in assisting large language models (LLMs)
RAG outputs risk leaking sensitive information from the external data source.
We propose an algorithm that smartly spends privacy budget only for the tokens that require the sensitive information.
arXiv Detail & Related papers (2024-12-06T01:20:16Z) - Calibrating Practical Privacy Risks for Differentially Private Machine Learning [5.363664265121231]
We study the approaches that can lower the attacking success rate to allow for more flexible privacy budget settings in model training.
We find that by selectively suppressing privacy-sensitive features, we can achieve lower ASR values without compromising application-specific data utility.
arXiv Detail & Related papers (2024-10-30T03:52:01Z) - Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit.
We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z) - How Private are DP-SGD Implementations? [61.19794019914523]
We show that there can be a substantial gap between the privacy analysis when using the two types of batch sampling.
Our result shows that there can be a substantial gap between the privacy analysis when using the two types of batch sampling.
arXiv Detail & Related papers (2024-03-26T13:02:43Z) - Privacy Amplification for the Gaussian Mechanism via Bounded Support [64.86780616066575]
Data-dependent privacy accounting frameworks such as per-instance differential privacy (pDP) and Fisher information loss (FIL) confer fine-grained privacy guarantees for individuals in a fixed training dataset.
We propose simple modifications of the Gaussian mechanism with bounded support, showing that they amplify privacy guarantees under data-dependent accounting.
arXiv Detail & Related papers (2024-03-07T21:22:07Z) - Conciliating Privacy and Utility in Data Releases via Individual Differential Privacy and Microaggregation [4.287502453001108]
$epsilon$-Differential privacy (DP) is a well-known privacy model that offers strong privacy guarantees.
We propose $epsilon$-individual differential privacy (iDP), which causes less data distortion while providing the same protection as DP to subjects.
We report on experiments that show how our approach can provide strong privacy (small $epsilon$) while yielding protected data that do not significantly degrade the accuracy of secondary data analysis.
arXiv Detail & Related papers (2023-12-21T10:23:18Z) - To share or not to share: What risks would laypeople accept to give sensitive data to differentially-private NLP systems? [14.586789605230672]
We argue that determining the $varepsilon$ value should not be solely in the hands of researchers or system developers.
We conduct a behavioral experiment (311 lay participants) to study the behavior of people in uncertain decision-making situations.
arXiv Detail & Related papers (2023-07-13T12:06:48Z) - Probing the Transition to Dataset-Level Privacy in ML Models Using an
Output-Specific and Data-Resolved Privacy Profile [23.05994842923702]
We study a privacy metric that quantifies the extent to which a model trained on a dataset using a Differential Privacy mechanism is covered" by each of the distributions resulting from training on neighboring datasets.
We show that the privacy profile can be used to probe an observed transition to indistinguishability that takes place in the neighboring distributions as $epsilon$ decreases.
arXiv Detail & Related papers (2023-06-27T20:39:07Z) - A Randomized Approach for Tight Privacy Accounting [63.67296945525791]
We propose a new differential privacy paradigm called estimate-verify-release (EVR)
EVR paradigm first estimates the privacy parameter of a mechanism, then verifies whether it meets this guarantee, and finally releases the query output.
Our empirical evaluation shows the newly proposed EVR paradigm improves the utility-privacy tradeoff for privacy-preserving machine learning.
arXiv Detail & Related papers (2023-04-17T00:38:01Z) - How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss.
We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z) - Analyzing Privacy Leakage in Machine Learning via Multiple Hypothesis
Testing: A Lesson From Fano [83.5933307263932]
We study data reconstruction attacks for discrete data and analyze it under the framework of hypothesis testing.
We show that if the underlying private data takes values from a set of size $M$, then the target privacy parameter $epsilon$ can be $O(log M)$ before the adversary gains significant inferential power.
arXiv Detail & Related papers (2022-10-24T23:50:12Z) - Algorithms with More Granular Differential Privacy Guarantees [65.3684804101664]
We consider partial differential privacy (DP), which allows quantifying the privacy guarantee on a per-attribute basis.
In this work, we study several basic data analysis and learning tasks, and design algorithms whose per-attribute privacy parameter is smaller that the best possible privacy parameter for the entire record of a person.
arXiv Detail & Related papers (2022-09-08T22:43:50Z) - DP$^2$-VAE: Differentially Private Pre-trained Variational Autoencoders [26.658723213776632]
We propose DP$2$-VAE, a training mechanism for variational autoencoders (VAE) with provable DP guarantees and improved utility via emphpre-training on private data.
We conduct extensive experiments on image datasets to illustrate our superiority over baselines under various privacy budgets and evaluation metrics.
arXiv Detail & Related papers (2022-08-05T23:57:34Z) - Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD.
We find that most examples enjoy stronger privacy guarantees than the worst-case bound.
This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z) - Production of Categorical Data Verifying Differential Privacy:
Conception and Applications to Machine Learning [0.0]
Differential privacy is a formal definition that allows quantifying the privacy-utility trade-off.
With the local DP (LDP) model, users can sanitize their data locally before transmitting it to the server.
In all cases, we concluded that differentially private ML models achieve nearly the same utility metrics as non-private ones.
arXiv Detail & Related papers (2022-04-02T12:50:14Z) - Differentially Private Federated Learning on Heterogeneous Data [10.431137628048356]
Federated Learning (FL) is a paradigm for large-scale distributed learning.
It faces two key challenges: (i) efficient training from highly heterogeneous user data, and (ii) protecting the privacy of participating users.
We propose a novel FL approach to tackle these two challenges together by incorporating Differential Privacy (DP) constraints.
arXiv Detail & Related papers (2021-11-17T18:23:49Z) - Quantifying identifiability to choose and audit $\epsilon$ in
differentially private deep learning [15.294433619347082]
To use differential privacy in machine learning, data scientists must choose privacy parameters $(epsilon,delta)$.
We transform $(epsilon,delta)$ to a bound on the Bayesian posterior belief of the adversary assumed by differential privacy concerning the presence of any record in the training dataset.
We formulate an implementation of this differential privacy adversary that allows data scientists to audit model training and compute empirical identifiability scores and empirical $(epsilon,delta)$.
arXiv Detail & Related papers (2021-03-04T09:35:58Z) - Private Reinforcement Learning with PAC and Regret Guarantees [69.4202374491817]
We design privacy preserving exploration policies for episodic reinforcement learning (RL)
We first provide a meaningful privacy formulation using the notion of joint differential privacy (JDP)
We then develop a private optimism-based learning algorithm that simultaneously achieves strong PAC and regret bounds, and enjoys a JDP guarantee.
arXiv Detail & Related papers (2020-09-18T20:18:35Z) - Individual Privacy Accounting via a Renyi Filter [33.65665839496798]
We give a method for tighter privacy loss accounting based on the value of a personalized privacy loss estimate for each individual.
Our filter is simpler and tighter than the known filter for $(epsilon,delta)$-differential privacy by Rogers et al.
arXiv Detail & Related papers (2020-08-25T17:49:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.