Algorithms for bounding contribution for histogram estimation under
user-level privacy
- URL: http://arxiv.org/abs/2206.03008v2
- Date: Fri, 30 Jun 2023 14:21:27 GMT
- Title: Algorithms for bounding contribution for histogram estimation under
user-level privacy
- Authors: Yuhan Liu, Ananda Theertha Suresh, Wennan Zhu, Peter Kairouz, Marco
Gruteser
- Abstract summary: We study the problem of histogram estimation under user-level differential privacy.
The goal is to preserve the privacy of all entries of any single user.
We propose algorithms to choose the best user contribution bound for histogram estimation.
- Score: 37.406400590568644
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of histogram estimation under user-level differential
privacy, where the goal is to preserve the privacy of all entries of any single
user. We consider the heterogeneous scenario where the quantity of data can be
different for each user. In this scenario, the amount of noise injected into
the histogram to obtain differential privacy is proportional to the maximum
user contribution, which can be amplified by few outliers. One approach to
circumvent this would be to bound (or limit) the contribution of each user to
the histogram. However, if users are limited to small contributions, a
significant amount of data will be discarded. In this work, we propose
algorithms to choose the best user contribution bound for histogram estimation
under both bounded and unbounded domain settings. When the size of the domain
is bounded, we propose a user contribution bounding strategy that almost
achieves a two-approximation with respect to the best contribution bound in
hindsight. For unbounded domain histogram estimation, we propose an algorithm
that is logarithmic-approximation with respect to the best contribution bound
in hindsight. This result holds without any distribution assumptions on the
data. Experiments on both real and synthetic datasets verify our theoretical
findings and demonstrate the effectiveness of our algorithms. We also show that
clipping bias introduced by bounding user contribution may be reduced under
mild distribution assumptions, which can be of independent interest.
Related papers
- Scalable contribution bounding to achieve privacy [62.6490768306231]
In modern datasets, enforcing user-level privacy requires capping each user's total contribution.<n>Existing algorithms for this task are computationally intensive and do not scale to the massive datasets prevalent today.<n>Our approach models the complex ownership structure as a hypergraph, where users are vertices and records are hyperedges.<n>A record is added to the final dataset only if all its owners unanimously agree, thereby ensuring that no user's predefined contribution limit is violated.
arXiv Detail & Related papers (2025-07-31T11:14:17Z) - Practical and Accurate Local Edge Differentially Private Graph Algorithms [11.593320678280824]
We introduce novel LDP algorithms for two fundamental graph statistics: k-core decomposition and triangle counting.<n>Our approach leverages input-dependent private graph properties, specifically the degeneracy and maximum degree of the graph, to improve theoretical utility.<n>Our k-core decomposition achieves errors within 3x of exact values, far outperforming the 131x error in the baseline of Dhulipala et al.
arXiv Detail & Related papers (2025-06-25T20:54:07Z) - On the Price of Differential Privacy for Hierarchical Clustering [21.64775530337936]
We present a novel algorithm in the weight privacy model that shows significantly better approximation than known impossibility results in the edge-level DP setting.
Our results show that our algorithm performs well in terms of extra cost and has good scalability to large graphs.
arXiv Detail & Related papers (2025-04-22T04:39:40Z) - Graph-Dependent Regret Bounds in Multi-Armed Bandits with Interference [18.101059380671582]
We study multi-armed bandits under network interference.<n>This induces an exponentially large action space.<n>We propose a novel algorithm that uses the local graph structure to minimize regret.
arXiv Detail & Related papers (2025-03-10T17:25:24Z) - Linear-Time User-Level DP-SCO via Robust Statistics [55.350093142673316]
User-level differentially private convex optimization (DP-SCO) has garnered significant attention due to the importance of safeguarding user privacy in machine learning applications.
Current methods, such as those based on differentially private gradient descent (DP-SGD), often struggle with high noise accumulation and suboptimal utility.
We introduce a novel linear-time algorithm that leverages robust statistics, specifically the median and trimmed mean, to overcome these challenges.
arXiv Detail & Related papers (2025-02-13T02:05:45Z) - Scalable Private Partition Selection via Adaptive Weighting [66.09199304818928]
In a private set union, users hold subsets of items from an unbounded universe.
The goal is to output as many items as possible from the union of the users' sets while maintaining user-level differential privacy.
We propose an algorithm for this problem, MaximumDegree (MAD), which adaptively reroutes weight from items with weight far above the threshold needed for privacy to items with smaller weight.
arXiv Detail & Related papers (2025-02-13T01:27:11Z) - A Generalized Shuffle Framework for Privacy Amplification: Strengthening Privacy Guarantees and Enhancing Utility [4.7712438974100255]
We show how to shuffle $(epsilon_i,delta_i)$-PLDP setting with personalized privacy parameters.
We prove that shuffled $(epsilon_i,delta_i)$-PLDP process approximately preserves $mu$-Gaussian Differential Privacy with mu = sqrtfrac2sum_i=1n frac1-delta_i1+eepsilon_i-max_ifrac1-delta_i1+e
arXiv Detail & Related papers (2023-12-22T02:31:46Z) - User-Level Differential Privacy With Few Examples Per User [73.81862394073308]
We consider the example-scarce regime, where each user has only a few examples, and obtain the following results.
For approximate-DP, we give a generic transformation of any item-level DP algorithm to a user-level DP algorithm.
We present a simple technique for adapting the exponential mechanism [McSherry, Talwar FOCS 2007] to the user-level setting.
arXiv Detail & Related papers (2023-09-21T21:51:55Z) - A Unifying Privacy Analysis Framework for Unknown Domain Algorithms in Differential Privacy [1.5773159234875098]
We revisit some of the existing differentially private algorithms for releasing histograms over unknown domains.
The main practical advantage of releasing histograms over an unknown domain is that the algorithm does not need to fill in missing labels.
We present a unified framework for the privacy analyses of several existing algorithms.
arXiv Detail & Related papers (2023-09-17T05:47:33Z) - Mean Estimation with User-level Privacy under Data Heterogeneity [54.07947274508013]
Different users may possess vastly different numbers of data points.
It cannot be assumed that all users sample from the same underlying distribution.
We propose a simple model of heterogeneous user data that allows user data to differ in both distribution and quantity of data.
arXiv Detail & Related papers (2023-07-28T23:02:39Z) - Unbounded Differentially Private Quantile and Maximum Estimation [2.855485723554975]
We show that a simple invocation of $textttAboveThreshold$ can give more accurate and robust estimates on the highest quantiles.
We show that an improved analysis of $textttAboveThreshold$ can improve the privacy guarantees for the widely used Sparse Vector Technique.
arXiv Detail & Related papers (2023-05-02T03:23:07Z) - Collaborative likelihood-ratio estimation over graphs [55.98760097296213]
Graph-based Relative Unconstrained Least-squares Importance Fitting (GRULSIF)
We develop this idea in a concrete non-parametric method that we call Graph-based Relative Unconstrained Least-squares Importance Fitting (GRULSIF)
We derive convergence rates for our collaborative approach that highlights the role played by variables such as the number of available observations per node, the size of the graph, and how accurately the graph structure encodes the similarity between tasks.
arXiv Detail & Related papers (2022-05-28T15:37:03Z) - Learning with User-Level Privacy [61.62978104304273]
We analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints.
Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution.
We derive an algorithm that privately answers a sequence of $K$ adaptively chosen queries with privacy cost proportional to $tau$, and apply it to solve the learning tasks we consider.
arXiv Detail & Related papers (2021-02-23T18:25:13Z) - Hiding Among the Clones: A Simple and Nearly Optimal Analysis of Privacy
Amplification by Shuffling [49.43288037509783]
We show that random shuffling amplifies differential privacy guarantees of locally randomized data.
Our result is based on a new approach that is simpler than previous work and extends to approximate differential privacy with nearly the same guarantees.
arXiv Detail & Related papers (2020-12-23T17:07:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.