Oblivious Sampling Algorithms for Private Data Analysis
- URL: http://arxiv.org/abs/2009.13689v1
- Date: Mon, 28 Sep 2020 23:45:30 GMT
- Title: Oblivious Sampling Algorithms for Private Data Analysis
- Authors: Sajin Sasy and Olga Ohrimenko
- Abstract summary: We study secure and privacy-preserving data analysis based on queries executed on samples from a dataset.
Trusted execution environments (TEEs) can be used to protect the content of the data during query computation.
Supporting differential-private (DP) queries in TEEs provides record privacy when query output is revealed.
- Score: 10.990447273771592
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study secure and privacy-preserving data analysis based on queries
executed on samples from a dataset. Trusted execution environments (TEEs) can
be used to protect the content of the data during query computation, while
supporting differential-private (DP) queries in TEEs provides record privacy
when query output is revealed. Support for sample-based queries is attractive
due to \emph{privacy amplification} since not all dataset is used to answer a
query but only a small subset. However, extracting data samples with TEEs while
proving strong DP guarantees is not trivial as secrecy of sample indices has to
be preserved. To this end, we design efficient secure variants of common
sampling algorithms. Experimentally we show that accuracy of models trained
with shuffling and sampling is the same for differentially private models for
MNIST and CIFAR-10, while sampling provides stronger privacy guarantees than
shuffling.
Related papers
- Benchmarking Fraud Detectors on Private Graph Data [70.4654745317714]
Currently, many types of fraud are managed in part by automated detection algorithms that operate over graphs.<n>We consider the scenario where a data holder wishes to outsource development of fraud detectors to third parties.<n>Third parties submit their fraud detectors to the data holder, who evaluates these algorithms on a private dataset and then publicly communicates the results.<n>We propose a realistic privacy attack on this system that allows an adversary to de-anonymize individuals' data based only on the evaluation results.
arXiv Detail & Related papers (2025-07-30T03:20:15Z) - Fiducial Matching: Differentially Private Inference for Categorical Data [0.0]
Inferential statistical inference is still an open area of investigation in a differentially private (DP) setting.<n>We propose a simulation-based matching approach, solved through tools from the fiducial framework.<n>We focus on the analysis of categorical (nominal) data that is common in national surveys.
arXiv Detail & Related papers (2025-07-15T21:56:15Z) - Improving Statistical Privacy by Subsampling [0.0]
A privacy mechanism often used is to take samples of the data for answering a query.<n>This paper proves precise bounds how much different methods of sampling increase privacy in the statistical setting.<n>For the DP setting tradeoff functions have been proposed as a finer measure for privacy compared to (epsilon,delta)-pairs.
arXiv Detail & Related papers (2025-04-15T17:40:45Z) - How Private are DP-SGD Implementations? [61.19794019914523]
We show that there can be a substantial gap between the privacy analysis when using the two types of batch sampling.
Our result shows that there can be a substantial gap between the privacy analysis when using the two types of batch sampling.
arXiv Detail & Related papers (2024-03-26T13:02:43Z) - Benchmarking Private Population Data Release Mechanisms: Synthetic Data vs. TopDown [50.40020716418472]
This study conducts a comparison between the TopDown algorithm and private synthetic data generation to determine how accuracy is affected by query complexity.
Our results show that for in-distribution queries, the TopDown algorithm achieves significantly better privacy-fidelity tradeoffs than any of the synthetic data methods we evaluated.
arXiv Detail & Related papers (2024-01-31T17:38:34Z) - Enhancing Trade-offs in Privacy, Utility, and Computational Efficiency through MUltistage Sampling Technique (MUST) [3.0939420223851446]
We propose a class of subsampling methods for privacy amplification (PA)
We conduct comprehensive analyses of the PA effects and utility for several 2-stage MUST procedures.
We provide the privacy loss composition analysis over repeated applications of MUST.
arXiv Detail & Related papers (2023-12-20T19:38:29Z) - DP-PQD: Privately Detecting Per-Query Gaps In Synthetic Data Generated By Black-Box Mechanisms [17.562365686511818]
We present a novel framework named DP-PQD (differentially-private per-query decider) to detect if the query answers on the private and synthetic datasets are within a user-specified threshold of each other.
We give a suite of private algorithms for per-query deciders for count, sum, and median queries, analyze their properties, and evaluate them experimentally.
arXiv Detail & Related papers (2023-09-15T17:38:59Z) - Personalized Privacy Amplification via Importance Sampling [3.0636509793595548]
In this paper, we examine the privacy properties of importance sampling, focusing on an individualized privacy analysis.
We find that, in importance sampling, privacy is well aligned with utility but at odds with sample size.
We propose two approaches for constructing sampling distributions: one that optimize the privacy-efficiency trade-off; and one based on a utility guarantee in the form of coresets.
arXiv Detail & Related papers (2023-07-05T17:09:10Z) - Differentially Private Federated Clustering over Non-IID Data [59.611244450530315]
clustering clusters (FedC) problem aims to accurately partition unlabeled data samples distributed over massive clients into finite clients under the orchestration of a server.
We propose a novel FedC algorithm using differential privacy convergence technique, referred to as DP-Fed, in which partial participation and multiple clients are also considered.
Various attributes of the proposed DP-Fed are obtained through theoretical analyses of privacy protection, especially for the case of non-identically and independently distributed (non-i.i.d.) data.
arXiv Detail & Related papers (2023-01-03T05:38:43Z) - Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent [69.14164921515949]
We characterize privacy guarantees for individual examples when releasing models trained by DP-SGD.
We find that most examples enjoy stronger privacy guarantees than the worst-case bound.
This implies groups that are underserved in terms of model utility simultaneously experience weaker privacy guarantees.
arXiv Detail & Related papers (2022-06-06T13:49:37Z) - Privacy Amplification via Random Participation in Federated Learning [3.8580784887142774]
In a federated setting, we consider random participation of the clients in addition to subsampling their local datasets.
We show that when the size of the local datasets is small, the privacy guarantees via random participation is close to those of the centralized setting.
arXiv Detail & Related papers (2022-05-03T15:11:34Z) - Uniformity Testing in the Shuffle Model: Simpler, Better, Faster [0.0]
Uniformity testing, or testing whether independent observations are uniformly distributed, is the question in distribution testing.
In this work, we considerably simplify the analysis of the known uniformity testing algorithm in the shuffle model.
arXiv Detail & Related papers (2021-08-20T03:43:12Z) - Hiding Among the Clones: A Simple and Nearly Optimal Analysis of Privacy
Amplification by Shuffling [49.43288037509783]
We show that random shuffling amplifies differential privacy guarantees of locally randomized data.
Our result is based on a new approach that is simpler than previous work and extends to approximate differential privacy with nearly the same guarantees.
arXiv Detail & Related papers (2020-12-23T17:07:26Z) - Controlling Privacy Loss in Sampling Schemes: an Analysis of Stratified
and Cluster Sampling [23.256638764430516]
In this work, we extend the study of privacy amplification results to more complex, data-dependent sampling schemes.
We find that not only do these sampling schemes often fail to amplify privacy, they can actually result in privacy degradation.
arXiv Detail & Related papers (2020-07-24T17:43:08Z) - XOR Mixup: Privacy-Preserving Data Augmentation for One-Shot Federated
Learning [49.130350799077114]
We develop a privacy-preserving XOR based mixup data augmentation technique, coined XorMixup.
The core idea is to collect other devices' encoded data samples that are decoded only using each device's own data samples.
XorMixFL achieves up to 17.6% higher accuracy than Vanilla FL under a non-IID MNIST dataset.
arXiv Detail & Related papers (2020-06-09T09:43:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.