Related papers: Differentially Private Set Union

Differentially Private Set Union

URL: http://arxiv.org/abs/2002.09745v2
Date: Wed, 6 Apr 2022 23:04:55 GMT
Title: Differentially Private Set Union
Authors: Sivakanth Gopi, Pankaj Gulhane, Janardhan Kulkarni, Judy Hanwen Shen, Milad Shokouhi and Sergey Yekhanin
Abstract summary: We study the basic operation of set union in the global model of differential privacy. In this problem, we are given a universe $U$ of items, possibly of infinite size, and a database $D$ of users. We want an ($epsilon$,$delta$)-differentially private algorithm which outputs a subset $S subset cup_i W_i$ such that the size of $S$ is as large as possible.
Score: 23.001440922454407
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the basic operation of set union in the global model of differential privacy. In this problem, we are given a universe $U$ of items, possibly of infinite size, and a database $D$ of users. Each user $i$ contributes a subset $W_i \subseteq U$ of items. We want an ($\epsilon$,$\delta$)-differentially private algorithm which outputs a subset $S \subset \cup_i W_i$ such that the size of $S$ is as large as possible. The problem arises in countless real world applications; it is particularly ubiquitous in natural language processing (NLP) applications as vocabulary extraction. For example, discovering words, sentences, $n$-grams etc., from private text data belonging to users is an instance of the set union problem. Known algorithms for this problem proceed by collecting a subset of items from each user, taking the union of such subsets, and disclosing the items whose noisy counts fall above a certain threshold. Crucially, in the above process, the contribution of each individual user is always independent of the items held by other users, resulting in a wasteful aggregation process, where some item counts happen to be way above the threshold. We deviate from the above paradigm by allowing users to contribute their items in a $\textit{dependent fashion}$, guided by a $\textit{policy}$. In this new setting ensuring privacy is significantly delicate. We prove that any policy which has certain $\textit{contractive}$ properties would result in a differentially private algorithm. We design two new algorithms, one using Laplace noise and other Gaussian noise, as specific instances of policies satisfying the contractive properties. Our experiments show that the new algorithms significantly outperform previously known mechanisms for the problem.

Related papers

Differentially Private Space-Efficient Algorithms for Counting Distinct Elements in the Turnstile Model [61.40326886123332]
We give the first sublinear space differentially private algorithms for the fundamental problem of counting distinct elements in the turnstile streaming model.<n>Our result significantly improves upon the space requirements of the state-of-the-art algorithms for this problem, which is linear.<n>When a bound $W$ on the number of times an item appears in the stream is known, our algorithm provides $tildeO_eta(sqrtW)$ additive error using $tildeO_eta(sqrtW)$ space.
arXiv Detail & Related papers (2025-05-29T17:21:20Z)
Optimized Tradeoffs for Private Prediction with Majority Ensembling [59.99331405291337]
We introduce the Data-dependent Randomized Response Majority (DaRRM) algorithm. DaRRM is parameterized by a data-dependent noise function $gamma$, and enables efficient utility optimization over the class of all private algorithms. We show that DaRRM provably enjoys a privacy gain of a factor of 2 over common baselines, with fixed utility.
arXiv Detail & Related papers (2024-11-27T00:48:48Z)
Perturb-and-Project: Differentially Private Similarities and Marginals [73.98880839337873]
We revisit the input perturbations framework for differential privacy where noise is added to the input $Ain mathcalS$. We first design novel efficient algorithms to privately release pair-wise cosine similarities. We derive a novel algorithm to compute $k$-way marginal queries over $n$ features.
arXiv Detail & Related papers (2024-06-07T12:07:16Z)
User-Level Differential Privacy With Few Examples Per User [73.81862394073308]
We consider the example-scarce regime, where each user has only a few examples, and obtain the following results. For approximate-DP, we give a generic transformation of any item-level DP algorithm to a user-level DP algorithm. We present a simple technique for adapting the exponential mechanism [McSherry, Talwar FOCS 2007] to the user-level setting.
arXiv Detail & Related papers (2023-09-21T21:51:55Z)
Differentially Private Clustering in Data Streams [65.78882209673885]
We present a differentially private streaming clustering framework which only requires an offline DP coreset or clustering algorithm as a blackbox. Our framework is also differentially private under the continual release setting, i.e., the union of outputs of our algorithms at every timestamp is always differentially private.
arXiv Detail & Related papers (2023-07-14T16:11:22Z)
A bounded-noise mechanism for differential privacy [3.9596068699962323]
We output an approximation of the average $frac1nsum_i=1n vecx(i)$ of vectors $vecx(i) in [0,1]k$, while preserving the privacy with respect to any $vecx(i)$. We present an $(epsilon,delta)$-private mechanism with optimal $ell_infty$ error for most values of $delta$.
arXiv Detail & Related papers (2020-12-07T16:03:21Z)
The Sparse Vector Technique, Revisited [67.57692396665915]
We revisit one of the most basic and widely applicable techniques in the literature of differential privacy. This simple algorithm privately tests whether the value of a given query on a database is close to what we expect it to be. We suggest an alternative, equally simple, algorithm that can continue testing queries as long as any single individual does not contribute to the answer of too many queries.
arXiv Detail & Related papers (2020-10-02T10:50:52Z)
On Distributed Differential Privacy and Counting Distinct Elements [52.701425652208734]
We study the setup where each of $n$ users holds an element from a discrete set. The goal is to count the number of distinct elements across all users.
arXiv Detail & Related papers (2020-09-21T04:13:34Z)
Learning discrete distributions: user vs item-level privacy [47.05234904407048]
Recently many practical applications such as federated learning require preserving privacy for all items of a single user. We study the fundamental problem of learning discrete distributions over $k$ symbols with user-level differential privacy. We propose a mechanism such that the number of users scales as $tildemathcalO(k/(malpha2) + k/sqrtmepsilonalpha)$ and hence the privacy penalty is $tildeTheta(sqrtm)$ times smaller compared to the standard mechanisms.
arXiv Detail & Related papers (2020-07-27T16:15:14Z)
Locally Private Hypothesis Selection [96.06118559817057]
We output a distribution from $mathcalQ$ whose total variation distance to $p$ is comparable to the best such distribution. We show that the constraint of local differential privacy incurs an exponential increase in cost. Our algorithms result in exponential improvements on the round complexity of previous methods.
arXiv Detail & Related papers (2020-02-21T18:30:48Z)
Differentially Private Release and Learning of Threshold Functions [27.612916837481485]
We prove new upper and lower bounds on the sample complexity of $(epsilon, delta)$ differentially private algorithms. A threshold function $c_x$ over a totally ordered domain $X$ evaluates to $c_x(y) = 1$ if $y le x$, and evaluates to $0$ otherwise.
arXiv Detail & Related papers (2015-04-28T16:15:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.