Controlling Privacy Loss in Sampling Schemes: an Analysis of Stratified
and Cluster Sampling
- URL: http://arxiv.org/abs/2007.12674v2
- Date: Wed, 21 Jun 2023 22:54:04 GMT
- Title: Controlling Privacy Loss in Sampling Schemes: an Analysis of Stratified
and Cluster Sampling
- Authors: Mark Bun and J\"org Drechsler and Marco Gaboardi and Audra McMillan
and Jayshree Sarathy
- Abstract summary: In this work, we extend the study of privacy amplification results to more complex, data-dependent sampling schemes.
We find that not only do these sampling schemes often fail to amplify privacy, they can actually result in privacy degradation.
- Score: 23.256638764430516
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sampling schemes are fundamental tools in statistics, survey design, and
algorithm design. A fundamental result in differential privacy is that a
differentially private mechanism run on a simple random sample of a population
provides stronger privacy guarantees than the same algorithm run on the entire
population. However, in practice, sampling designs are often more complex than
the simple, data-independent sampling schemes that are addressed in prior work.
In this work, we extend the study of privacy amplification results to more
complex, data-dependent sampling schemes. We find that not only do these
sampling schemes often fail to amplify privacy, they can actually result in
privacy degradation. We analyze the privacy implications of the pervasive
cluster sampling and stratified sampling paradigms, as well as provide some
insight into the study of more general sampling designs.
Related papers
- Improving Statistical Privacy by Subsampling [0.0]
A privacy mechanism often used is to take samples of the data for answering a query.<n>This paper proves precise bounds how much different methods of sampling increase privacy in the statistical setting.<n>For the DP setting tradeoff functions have been proposed as a finer measure for privacy compared to (epsilon,delta)-pairs.
arXiv Detail & Related papers (2025-04-15T17:40:45Z) - Differentially Private Random Feature Model [52.468511541184895]
We produce a differentially private random feature model for privacy-preserving kernel machines.
We show that our method preserves privacy and derive a generalization error bound for the method.
arXiv Detail & Related papers (2024-12-06T05:31:08Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Personalized Privacy Amplification via Importance Sampling [2.1485350418225244]
We study the privacy-enhancing properties of importance sampling.
We evaluate the privacy, efficiency, and accuracy of importance sampling on the example of k-means clustering.
arXiv Detail & Related papers (2023-07-05T17:09:10Z) - On Differential Privacy and Adaptive Data Analysis with Bounded Space [76.10334958368618]
We study the space complexity of the two related fields of differential privacy and adaptive data analysis.
We show that there exists a problem P that requires exponentially more space to be solved efficiently with differential privacy.
The line of work on adaptive data analysis focuses on understanding the number of samples needed for answering a sequence of adaptive queries.
arXiv Detail & Related papers (2023-02-11T14:45:31Z) - Private Set Generation with Discriminative Information [63.851085173614]
Differentially private data generation is a promising solution to the data privacy challenge.
Existing private generative models are struggling with the utility of synthetic samples.
We introduce a simple yet effective method that greatly improves the sample utility of state-of-the-art approaches.
arXiv Detail & Related papers (2022-11-07T10:02:55Z) - Differentially-Private Clustering of Easy Instances [67.04951703461657]
In differentially private clustering, the goal is to identify $k$ cluster centers without disclosing information on individual data points.
We provide implementable differentially private clustering algorithms that provide utility when the data is "easy"
We propose a framework that allows us to apply non-private clustering algorithms to the easy instances and privately combine the results.
arXiv Detail & Related papers (2021-12-29T08:13:56Z) - Uniformity Testing in the Shuffle Model: Simpler, Better, Faster [0.0]
Uniformity testing, or testing whether independent observations are uniformly distributed, is the question in distribution testing.
In this work, we considerably simplify the analysis of the known uniformity testing algorithm in the shuffle model.
arXiv Detail & Related papers (2021-08-20T03:43:12Z) - Renyi Differential Privacy of the Subsampled Shuffle Model in
Distributed Learning [7.197592390105457]
We study privacy in a distributed learning framework, where clients collaboratively build a learning model iteratively through interactions with a server from whom we need privacy.
Motivated by optimization and the federated learning (FL) paradigm, we focus on the case where a small fraction of data samples are randomly sub-sampled in each round.
To obtain even stronger local privacy guarantees, we study this in the shuffle privacy model, where each client randomizes its response using a local differentially private (LDP) mechanism.
arXiv Detail & Related papers (2021-07-19T11:43:24Z) - Hiding Among the Clones: A Simple and Nearly Optimal Analysis of Privacy
Amplification by Shuffling [49.43288037509783]
We show that random shuffling amplifies differential privacy guarantees of locally randomized data.
Our result is based on a new approach that is simpler than previous work and extends to approximate differential privacy with nearly the same guarantees.
arXiv Detail & Related papers (2020-12-23T17:07:26Z) - Oblivious Sampling Algorithms for Private Data Analysis [10.990447273771592]
We study secure and privacy-preserving data analysis based on queries executed on samples from a dataset.
Trusted execution environments (TEEs) can be used to protect the content of the data during query computation.
Supporting differential-private (DP) queries in TEEs provides record privacy when query output is revealed.
arXiv Detail & Related papers (2020-09-28T23:45:30Z) - RDP-GAN: A R\'enyi-Differential Privacy based Generative Adversarial
Network [75.81653258081435]
Generative adversarial network (GAN) has attracted increasing attention recently owing to its impressive ability to generate realistic samples with high privacy protection.
However, when GANs are applied on sensitive or private training examples, such as medical or financial records, it is still probable to divulge individuals' sensitive and private information.
We propose a R'enyi-differentially private-GAN (RDP-GAN), which achieves differential privacy (DP) in a GAN by carefully adding random noises on the value of the loss function during training.
arXiv Detail & Related papers (2020-07-04T09:51:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.