Improving the Variance of Differentially Private Randomized Experiments through Clustering
- URL: http://arxiv.org/abs/2308.00957v3
- Date: Wed, 28 May 2025 00:53:29 GMT
- Title: Improving the Variance of Differentially Private Randomized Experiments through Clustering
- Authors: Adel Javanmard, Vahab Mirrokni, Jean Pouget-Abadie,
- Abstract summary: We propose a new differentially private mechanism, "Cluster-DP"<n>We demonstrate that selecting higher-quality clusters, according to a quality metric we introduce, can decrease the variance penalty without compromising privacy guarantees.
- Score: 16.166525280886578
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Estimating causal effects from randomized experiments is only possible if participants are willing to disclose their potentially sensitive responses. Differential privacy, a widely used framework for ensuring an algorithms privacy guarantees, can encourage participants to share their responses without the risk of de-anonymization. However, many mechanisms achieve differential privacy by adding noise to the original dataset, which reduces the precision of causal effect estimation. This introduces a fundamental trade-off between privacy and variance when performing causal analyses on differentially private data. In this work, we propose a new differentially private mechanism, "Cluster-DP", which leverages a given cluster structure in the data to improve the privacy-variance trade-off. While our results apply to any clustering, we demonstrate that selecting higher-quality clusters, according to a quality metric we introduce, can decrease the variance penalty without compromising privacy guarantees. Finally, we evaluate the theoretical and empirical performance of our Cluster-DP algorithm on both real and simulated data, comparing it to common baselines, including two special cases of our algorithm: its unclustered version and a uniform-prior version.
Related papers
- From Randomized Response to Randomized Index: Answering Subset Counting Queries with Local Differential Privacy [27.59934932590226]
Local Differential Privacy (LDP) is the predominant privacy model for safeguarding individual data privacy.<n>We propose an alternative approach -- instead of perturbing values, we apply randomization to indexes of values.<n>Inspired by the deniability of randomized indexes, we present CRIAD for answering subset counting queries on set-value data.
arXiv Detail & Related papers (2025-04-24T13:08:11Z) - Federated Learning with Differential Privacy: An Utility-Enhanced Approach [12.614480013684759]
Federated learning has emerged as an attractive approach to protect data privacy by eliminating the need for sharing clients' data.<n>Recent studies have shown that federated learning alone does not guarantee privacy, as private data may still be inferred from the uploaded parameters to the central server.<n>We present a modification to these vanilla differentially private algorithms based on a Haar wavelet transformation step and a novel noise injection scheme that significantly lowers the bound of the noise variance.
arXiv Detail & Related papers (2025-03-27T04:48:29Z) - Differentially Private Random Feature Model [52.468511541184895]
We produce a differentially private random feature model for privacy-preserving kernel machines.
We show that our method preserves privacy and derive a generalization error bound for the method.
arXiv Detail & Related papers (2024-12-06T05:31:08Z) - Differentially Private Covariate Balancing Causal Inference [8.133739801185271]
Differential privacy is the leading mathematical framework for privacy protection.
Our algorithm produces both point and interval estimators with statistical guarantees, such as consistency and rate optimality, under a given privacy budget.
arXiv Detail & Related papers (2024-10-18T18:02:13Z) - Private Estimation when Data and Privacy Demands are Correlated [5.755004576310333]
Differential Privacy is the current gold-standard for ensuring privacy for statistical queries.
We consider the problems of empirical mean estimation for univariate data and frequency estimation for categorical data.
We establish theoretical performance guarantees for our proposed algorithms, under both PAC error and mean-squared error.
arXiv Detail & Related papers (2024-07-15T22:46:02Z) - Optimal Federated Learning for Nonparametric Regression with Heterogeneous Distributed Differential Privacy Constraints [5.3595271893779906]
We study federated learning for nonparametric regression in the context of distributed samples across different servers.
Findings shed light on the tradeoff between statistical accuracy and privacy preservation.
arXiv Detail & Related papers (2024-06-10T19:34:07Z) - TernaryVote: Differentially Private, Communication Efficient, and
Byzantine Resilient Distributed Optimization on Heterogeneous Data [50.797729676285876]
We propose TernaryVote, which combines a ternary compressor and the majority vote mechanism to realize differential privacy, gradient compression, and Byzantine resilience simultaneously.
We theoretically quantify the privacy guarantee through the lens of the emerging f-differential privacy (DP) and the Byzantine resilience of the proposed algorithm.
arXiv Detail & Related papers (2024-02-16T16:41:14Z) - Federated Experiment Design under Distributed Differential Privacy [31.06808163362162]
We focus on the rigorous protection of users' privacy while minimizing the trust toward service providers.
Although a vital component in modern A/B testing, private distributed experimentation has not previously been studied.
We show how these mechanisms can be scaled up to handle the very large number of participants commonly found in practice.
arXiv Detail & Related papers (2023-11-07T22:38:56Z) - Training Private Models That Know What They Don't Know [40.19666295972155]
We find that several popular selective prediction approaches are ineffective in a differentially private setting.
We propose a novel evaluation mechanism which isolate selective prediction performance across model utility levels.
arXiv Detail & Related papers (2023-05-28T12:20:07Z) - Differentially Private Federated Clustering over Non-IID Data [59.611244450530315]
clustering clusters (FedC) problem aims to accurately partition unlabeled data samples distributed over massive clients into finite clients under the orchestration of a server.
We propose a novel FedC algorithm using differential privacy convergence technique, referred to as DP-Fed, in which partial participation and multiple clients are also considered.
Various attributes of the proposed DP-Fed are obtained through theoretical analyses of privacy protection, especially for the case of non-identically and independently distributed (non-i.i.d.) data.
arXiv Detail & Related papers (2023-01-03T05:38:43Z) - On the Statistical Complexity of Estimation and Testing under Privacy Constraints [17.04261371990489]
We show how to characterize the power of a statistical test under differential privacy in a plug-and-play fashion.
We show that maintaining privacy results in a noticeable reduction in performance only when the level of privacy protection is very high.
Finally, we demonstrate that the DP-SGLD algorithm, a private convex solver, can be employed for maximum likelihood estimation with a high degree of confidence.
arXiv Detail & Related papers (2022-10-05T12:55:53Z) - Private Domain Adaptation from a Public Source [48.83724068578305]
We design differentially private discrepancy-based algorithms for adaptation from a source domain with public labeled data to a target domain with unlabeled private data.
Our solutions are based on private variants of Frank-Wolfe and Mirror-Descent algorithms.
arXiv Detail & Related papers (2022-08-12T06:52:55Z) - Mixed Differential Privacy in Computer Vision [133.68363478737058]
AdaMix is an adaptive differentially private algorithm for training deep neural network classifiers using both private and public image data.
A few-shot or even zero-shot learning baseline that ignores private data can outperform fine-tuning on a large private dataset.
arXiv Detail & Related papers (2022-03-22T06:15:43Z) - Differentially Private Estimation of Heterogeneous Causal Effects [9.355532300027727]
We introduce a general meta-algorithm for estimating conditional average treatment effects (CATE) with differential privacy guarantees.
Our meta-algorithm can work with simple, single-stage CATE estimators such as S-learner and more complex multi-stage estimators such as DR and R-learner.
arXiv Detail & Related papers (2022-02-22T17:21:18Z) - Robustness Threats of Differential Privacy [70.818129585404]
We experimentally demonstrate that networks, trained with differential privacy, in some settings might be even more vulnerable in comparison to non-private versions.
We study how the main ingredients of differentially private neural networks training, such as gradient clipping and noise addition, affect the robustness of the model.
arXiv Detail & Related papers (2020-12-14T18:59:24Z) - Graph-Homomorphic Perturbations for Private Decentralized Learning [64.26238893241322]
Local exchange of estimates allows inference of data based on private data.
perturbations chosen independently at every agent, resulting in a significant performance loss.
We propose an alternative scheme, which constructs perturbations according to a particular nullspace condition, allowing them to be invisible.
arXiv Detail & Related papers (2020-10-23T10:35:35Z) - RDP-GAN: A R\'enyi-Differential Privacy based Generative Adversarial
Network [75.81653258081435]
Generative adversarial network (GAN) has attracted increasing attention recently owing to its impressive ability to generate realistic samples with high privacy protection.
However, when GANs are applied on sensitive or private training examples, such as medical or financial records, it is still probable to divulge individuals' sensitive and private information.
We propose a R'enyi-differentially private-GAN (RDP-GAN), which achieves differential privacy (DP) in a GAN by carefully adding random noises on the value of the loss function during training.
arXiv Detail & Related papers (2020-07-04T09:51:02Z) - Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users.
An adversary may still be able to infer the private training data by attacking the released model.
Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.