Summary Statistic Privacy in Data Sharing
- URL: http://arxiv.org/abs/2303.02014v2
- Date: Fri, 27 Oct 2023 21:33:06 GMT
- Title: Summary Statistic Privacy in Data Sharing
- Authors: Zinan Lin, Shuaiqi Wang, Vyas Sekar, Giulia Fanti
- Abstract summary: We study a setting where a data holder wishes to share data with a receiver, without revealing certain summary statistics of the data distribution.
We propose summary statistic privacy, a metric for quantifying the privacy risk of such a mechanism.
We show that the proposed quantization mechanisms achieve better privacy-distortion tradeoffs than alternative privacy mechanisms.
- Score: 23.50797952699759
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study a setting where a data holder wishes to share data with a receiver,
without revealing certain summary statistics of the data distribution (e.g.,
mean, standard deviation). It achieves this by passing the data through a
randomization mechanism. We propose summary statistic privacy, a metric for
quantifying the privacy risk of such a mechanism based on the worst-case
probability of an adversary guessing the distributional secret within some
threshold. Defining distortion as a worst-case Wasserstein-1 distance between
the real and released data, we prove lower bounds on the tradeoff between
privacy and distortion. We then propose a class of quantization mechanisms that
can be adapted to different data distributions. We show that the quantization
mechanism's privacy-distortion tradeoff matches our lower bounds under certain
regimes, up to small constant factors. Finally, we demonstrate on real-world
datasets that the proposed quantization mechanisms achieve better
privacy-distortion tradeoffs than alternative privacy mechanisms.
Related papers
- Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - Guarding Multiple Secrets: Enhanced Summary Statistic Privacy for Data Sharing [3.7274308010465775]
We propose a novel framework to define, analyze, and protect multi-secret summary statistics privacy in data sharing.
We measure the privacy risk of any data release mechanism by the worst-case probability of an attacker successfully inferring summary statistic secrets.
arXiv Detail & Related papers (2024-05-22T16:30:34Z) - Privacy Amplification for the Gaussian Mechanism via Bounded Support [64.86780616066575]
Data-dependent privacy accounting frameworks such as per-instance differential privacy (pDP) and Fisher information loss (FIL) confer fine-grained privacy guarantees for individuals in a fixed training dataset.
We propose simple modifications of the Gaussian mechanism with bounded support, showing that they amplify privacy guarantees under data-dependent accounting.
arXiv Detail & Related papers (2024-03-07T21:22:07Z) - Unified Mechanism-Specific Amplification by Subsampling and Group Privacy Amplification [54.1447806347273]
Amplification by subsampling is one of the main primitives in machine learning with differential privacy.
We propose the first general framework for deriving mechanism-specific guarantees.
We analyze how subsampling affects the privacy of groups of multiple users.
arXiv Detail & Related papers (2024-03-07T19:36:05Z) - Breaking the Communication-Privacy-Accuracy Tradeoff with
$f$-Differential Privacy [51.11280118806893]
We consider a federated data analytics problem in which a server coordinates the collaborative data analysis of multiple users with privacy concerns and limited communication capability.
We study the local differential privacy guarantees of discrete-valued mechanisms with finite output space through the lens of $f$-differential privacy (DP)
More specifically, we advance the existing literature by deriving tight $f$-DP guarantees for a variety of discrete-valued mechanisms.
arXiv Detail & Related papers (2023-02-19T16:58:53Z) - DP2-Pub: Differentially Private High-Dimensional Data Publication with
Invariant Post Randomization [58.155151571362914]
We propose a differentially private high-dimensional data publication mechanism (DP2-Pub) that runs in two phases.
splitting attributes into several low-dimensional clusters with high intra-cluster cohesion and low inter-cluster coupling helps obtain a reasonable privacy budget.
We also extend our DP2-Pub mechanism to the scenario with a semi-honest server which satisfies local differential privacy.
arXiv Detail & Related papers (2022-08-24T17:52:43Z) - Post-processing of Differentially Private Data: A Fairness Perspective [53.29035917495491]
This paper shows that post-processing causes disparate impacts on individuals or groups.
It analyzes two critical settings: the release of differentially private datasets and the use of such private datasets for downstream decisions.
It proposes a novel post-processing mechanism that is (approximately) optimal under different fairness metrics.
arXiv Detail & Related papers (2022-01-24T02:45:03Z) - Distribution-Invariant Differential Privacy [4.700764053354502]
We develop a distribution-invariant privatization (DIP) method to reconcile high statistical accuracy and strict differential privacy.
Under the same strictness of privacy protection, DIP achieves superior statistical accuracy in two simulations and on three real-world benchmarks.
arXiv Detail & Related papers (2021-11-08T22:26:50Z) - Combining Public and Private Data [7.975795748574989]
We introduce a mixed estimator of the mean optimized to minimize variance.
We argue that our mechanism is preferable to techniques that preserve the privacy of individuals by subsampling data proportionally to the privacy needs of users.
arXiv Detail & Related papers (2021-10-29T23:25:49Z) - A Shuffling Framework for Local Differential Privacy [40.92785300658643]
ldp deployments are vulnerable to inference attacks as an adversary can link the noisy responses to their identity.
An alternative model, shuffle DP, prevents this by shuffling the noisy responses uniformly at random.
We show that systematic shuffling of the noisy responses can thwart specific inference attacks while retaining some meaningful data learnability.
arXiv Detail & Related papers (2021-06-11T20:36:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.