Related papers: Empirical Mean and Frequency Estimation Under Heterogeneous Privacy: A Worst-Case Analysis

Empirical Mean and Frequency Estimation Under Heterogeneous Privacy: A Worst-Case Analysis

URL: http://arxiv.org/abs/2407.11274v1
Date: Mon, 15 Jul 2024 22:46:02 GMT
Title: Empirical Mean and Frequency Estimation Under Heterogeneous Privacy: A Worst-Case Analysis
Authors: Syomantak Chaudhuri, Thomas A. Courtade,
Abstract summary: Differential Privacy (DP) is the current gold-standard for measuring privacy. We consider the problems of empirical mean estimation for univariate data and frequency estimation for categorical data, subject to heterogeneous privacy constraints. We prove some optimality results, under both PAC error and mean-squared error, for our proposed algorithms and demonstrate superior performance over other baseline techniques experimentally.
Score: 5.755004576310333
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Differential Privacy (DP) is the current gold-standard for measuring privacy. Estimation problems under DP constraints appearing in the literature have largely focused on providing equal privacy to all users. We consider the problems of empirical mean estimation for univariate data and frequency estimation for categorical data, two pillars of data analysis in the industry, subject to heterogeneous privacy constraints. Each user, contributing a sample to the dataset, is allowed to have a different privacy demand. The dataset itself is assumed to be worst-case and we study both the problems in two different formulations -- the correlated and the uncorrelated setting. In the former setting, the privacy demand and the user data can be arbitrarily correlated while in the latter setting, there is no correlation between the dataset and the privacy demand. We prove some optimality results, under both PAC error and mean-squared error, for our proposed algorithms and demonstrate superior performance over other baseline techniques experimentally.

Related papers

Benchmarking Fraud Detectors on Private Graph Data [70.4654745317714]
Currently, many types of fraud are managed in part by automated detection algorithms that operate over graphs.<n>We consider the scenario where a data holder wishes to outsource development of fraud detectors to third parties.<n>Third parties submit their fraud detectors to the data holder, who evaluates these algorithms on a private dataset and then publicly communicates the results.<n>We propose a realistic privacy attack on this system that allows an adversary to de-anonymize individuals' data based only on the evaluation results.
arXiv Detail & Related papers (2025-07-30T03:20:15Z)
Optimal Allocation of Privacy Budget on Hierarchical Data Release [48.96399034594329]
This paper addresses the problem of optimal privacy budget allocation for hierarchical data release.<n>It aims to maximize data utility subject to a total privacy budget while considering the inherent trade-offs between data granularity and privacy loss.
arXiv Detail & Related papers (2025-05-16T05:25:11Z)
Differentially Private Random Feature Model [52.468511541184895]
We produce a differentially private random feature model for privacy-preserving kernel machines. We show that our method preserves privacy and derive a generalization error bound for the method.
arXiv Detail & Related papers (2024-12-06T05:31:08Z)
DP-CDA: An Algorithm for Enhanced Privacy Preservation in Dataset Synthesis Through Randomized Mixing [0.8739101659113155]
We introduce an effective data publishing algorithm emphDP-CDA. Our proposed algorithm generates synthetic datasets by randomly mixing data in a class-specific manner, and inducing carefully-tuned randomness to ensure privacy guarantees. Our results indicate that synthetic datasets produced using the DP-CDA can achieve superior utility compared to those generated by traditional data publishing algorithms, even when subject to the same privacy requirements.
arXiv Detail & Related papers (2024-11-25T06:14:06Z)
Differentially Private Covariate Balancing Causal Inference [8.133739801185271]
Differential privacy is the leading mathematical framework for privacy protection. Our algorithm produces both point and interval estimators with statistical guarantees, such as consistency and rate optimality, under a given privacy budget.
arXiv Detail & Related papers (2024-10-18T18:02:13Z)
Initialization Matters: Privacy-Utility Analysis of Overparameterized Neural Networks [72.51255282371805]
We prove a privacy bound for the KL divergence between model distributions on worst-case neighboring datasets. We find that this KL privacy bound is largely determined by the expected squared gradient norm relative to model parameters during training.
arXiv Detail & Related papers (2023-10-31T16:13:22Z)
Conditional Density Estimations from Privacy-Protected Data [0.0]
We propose simulation-based inference methods from privacy-protected datasets. We illustrate our methods on discrete time-series data under an infectious disease model and with ordinary linear regression models.
arXiv Detail & Related papers (2023-10-19T14:34:17Z)
Causal Inference with Differentially Private (Clustered) Outcomes [16.166525280886578]
Estimating causal effects from randomized experiments is only feasible if participants agree to reveal their responses. We suggest a new differential privacy mechanism, Cluster-DP, which leverages any given cluster structure. We show that, depending on an intuitive measure of cluster quality, we can improve the variance loss while maintaining our privacy guarantees.
arXiv Detail & Related papers (2023-08-02T05:51:57Z)
Mean Estimation with User-level Privacy under Data Heterogeneity [54.07947274508013]
Different users may possess vastly different numbers of data points. It cannot be assumed that all users sample from the same underlying distribution. We propose a simple model of heterogeneous user data that allows user data to differ in both distribution and quantity of data.
arXiv Detail & Related papers (2023-07-28T23:02:39Z)
On Differential Privacy and Adaptive Data Analysis with Bounded Space [76.10334958368618]
We study the space complexity of the two related fields of differential privacy and adaptive data analysis. We show that there exists a problem P that requires exponentially more space to be solved efficiently with differential privacy. The line of work on adaptive data analysis focuses on understanding the number of samples needed for answering a sequence of adaptive queries.
arXiv Detail & Related papers (2023-02-11T14:45:31Z)
Differentially Private Federated Clustering over Non-IID Data [59.611244450530315]
clustering clusters (FedC) problem aims to accurately partition unlabeled data samples distributed over massive clients into finite clients under the orchestration of a server. We propose a novel FedC algorithm using differential privacy convergence technique, referred to as DP-Fed, in which partial participation and multiple clients are also considered. Various attributes of the proposed DP-Fed are obtained through theoretical analyses of privacy protection, especially for the case of non-identically and independently distributed (non-i.i.d.) data.
arXiv Detail & Related papers (2023-01-03T05:38:43Z)
How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss. We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z)
On the Statistical Complexity of Estimation and Testing under Privacy Constraints [17.04261371990489]
We show how to characterize the power of a statistical test under differential privacy in a plug-and-play fashion. We show that maintaining privacy results in a noticeable reduction in performance only when the level of privacy protection is very high. Finally, we demonstrate that the DP-SGLD algorithm, a private convex solver, can be employed for maximum likelihood estimation with a high degree of confidence.
arXiv Detail & Related papers (2022-10-05T12:55:53Z)
DP2-Pub: Differentially Private High-Dimensional Data Publication with Invariant Post Randomization [58.155151571362914]
We propose a differentially private high-dimensional data publication mechanism (DP2-Pub) that runs in two phases. splitting attributes into several low-dimensional clusters with high intra-cluster cohesion and low inter-cluster coupling helps obtain a reasonable privacy budget. We also extend our DP2-Pub mechanism to the scenario with a semi-honest server which satisfies local differential privacy.
arXiv Detail & Related papers (2022-08-24T17:52:43Z)
Private Domain Adaptation from a Public Source [48.83724068578305]
We design differentially private discrepancy-based algorithms for adaptation from a source domain with public labeled data to a target domain with unlabeled private data. Our solutions are based on private variants of Frank-Wolfe and Mirror-Descent algorithms.
arXiv Detail & Related papers (2022-08-12T06:52:55Z)
Combining Public and Private Data [7.975795748574989]
We introduce a mixed estimator of the mean optimized to minimize variance. We argue that our mechanism is preferable to techniques that preserve the privacy of individuals by subsampling data proportionally to the privacy needs of users.
arXiv Detail & Related papers (2021-10-29T23:25:49Z)
Non-parametric Differentially Private Confidence Intervals for the Median [3.205141100055992]
This paper proposes and evaluates several strategies to compute valid differentially private confidence intervals for the median. We also illustrate that addressing both sources of uncertainty--the error from sampling and the error from protecting the output--should be preferred over simpler approaches that incorporate the uncertainty in a sequential fashion.
arXiv Detail & Related papers (2021-06-18T19:45:37Z)
Decision Making with Differential Privacy under a Fairness Lens [65.16089054531395]
The U.S. Census Bureau releases data sets and statistics about groups of individuals that are used as input to a number of critical decision processes. To conform to privacy and confidentiality requirements, these agencies are often required to release privacy-preserving versions of the data. This paper studies the release of differentially private data sets and analyzes their impact on some critical resource allocation tasks under a fairness perspective.
arXiv Detail & Related papers (2021-05-16T21:04:19Z)
Robust and Differentially Private Mean Estimation [40.323756738056616]
Differential privacy has emerged as a standard requirement in a variety of applications ranging from the U.S. Census to data collected in commercial devices. An increasing number of such databases consist of data from multiple sources, not all of which can be trusted. This leaves existing private analyses vulnerable to attacks by an adversary who injects corrupted data.
arXiv Detail & Related papers (2021-02-18T05:02:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.