Differentially Private Set Union
- URL: http://arxiv.org/abs/2002.09745v2
- Date: Wed, 6 Apr 2022 23:04:55 GMT
- Title: Differentially Private Set Union
- Authors: Sivakanth Gopi, Pankaj Gulhane, Janardhan Kulkarni, Judy Hanwen Shen,
Milad Shokouhi and Sergey Yekhanin
- Abstract summary: We study the basic operation of set union in the global model of differential privacy.
In this problem, we are given a universe $U$ of items, possibly of infinite size, and a database $D$ of users.
We want an ($epsilon$,$delta$)-differentially private algorithm which outputs a subset $S subset cup_i W_i$ such that the size of $S$ is as large as possible.
- Score: 23.001440922454407
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the basic operation of set union in the global model of differential
privacy. In this problem, we are given a universe $U$ of items, possibly of
infinite size, and a database $D$ of users. Each user $i$ contributes a subset
$W_i \subseteq U$ of items. We want an ($\epsilon$,$\delta$)-differentially
private algorithm which outputs a subset $S \subset \cup_i W_i$ such that the
size of $S$ is as large as possible. The problem arises in countless real world
applications; it is particularly ubiquitous in natural language processing
(NLP) applications as vocabulary extraction. For example, discovering words,
sentences, $n$-grams etc., from private text data belonging to users is an
instance of the set union problem.
Known algorithms for this problem proceed by collecting a subset of items
from each user, taking the union of such subsets, and disclosing the items
whose noisy counts fall above a certain threshold. Crucially, in the above
process, the contribution of each individual user is always independent of the
items held by other users, resulting in a wasteful aggregation process, where
some item counts happen to be way above the threshold. We deviate from the
above paradigm by allowing users to contribute their items in a
$\textit{dependent fashion}$, guided by a $\textit{policy}$. In this new
setting ensuring privacy is significantly delicate. We prove that any policy
which has certain $\textit{contractive}$ properties would result in a
differentially private algorithm. We design two new algorithms, one using
Laplace noise and other Gaussian noise, as specific instances of policies
satisfying the contractive properties. Our experiments show that the new
algorithms significantly outperform previously known mechanisms for the
problem.
Related papers
- Perturb-and-Project: Differentially Private Similarities and Marginals [73.98880839337873]
We revisit the input perturbations framework for differential privacy where noise is added to the input $Ain mathcalS$.
We first design novel efficient algorithms to privately release pair-wise cosine similarities.
We derive a novel algorithm to compute $k$-way marginal queries over $n$ features.
arXiv Detail & Related papers (2024-06-07T12:07:16Z) - User-Level Differential Privacy With Few Examples Per User [73.81862394073308]
We consider the example-scarce regime, where each user has only a few examples, and obtain the following results.
For approximate-DP, we give a generic transformation of any item-level DP algorithm to a user-level DP algorithm.
We present a simple technique for adapting the exponential mechanism [McSherry, Talwar FOCS 2007] to the user-level setting.
arXiv Detail & Related papers (2023-09-21T21:51:55Z) - Differentially Private Clustering in Data Streams [65.78882209673885]
We present a differentially private streaming clustering framework which only requires an offline DP coreset or clustering algorithm as a blackbox.
Our framework is also differentially private under the continual release setting, i.e., the union of outputs of our algorithms at every timestamp is always differentially private.
arXiv Detail & Related papers (2023-07-14T16:11:22Z) - A bounded-noise mechanism for differential privacy [3.9596068699962323]
We output an approximation of the average $frac1nsum_i=1n vecx(i)$ of vectors $vecx(i) in [0,1]k$, while preserving the privacy with respect to any $vecx(i)$.
We present an $(epsilon,delta)$-private mechanism with optimal $ell_infty$ error for most values of $delta$.
arXiv Detail & Related papers (2020-12-07T16:03:21Z) - The Sparse Vector Technique, Revisited [67.57692396665915]
We revisit one of the most basic and widely applicable techniques in the literature of differential privacy.
This simple algorithm privately tests whether the value of a given query on a database is close to what we expect it to be.
We suggest an alternative, equally simple, algorithm that can continue testing queries as long as any single individual does not contribute to the answer of too many queries.
arXiv Detail & Related papers (2020-10-02T10:50:52Z) - On Distributed Differential Privacy and Counting Distinct Elements [52.701425652208734]
We study the setup where each of $n$ users holds an element from a discrete set.
The goal is to count the number of distinct elements across all users.
arXiv Detail & Related papers (2020-09-21T04:13:34Z) - Learning discrete distributions: user vs item-level privacy [47.05234904407048]
Recently many practical applications such as federated learning require preserving privacy for all items of a single user.
We study the fundamental problem of learning discrete distributions over $k$ symbols with user-level differential privacy.
We propose a mechanism such that the number of users scales as $tildemathcalO(k/(malpha2) + k/sqrtmepsilonalpha)$ and hence the privacy penalty is $tildeTheta(sqrtm)$ times smaller compared to the standard mechanisms.
arXiv Detail & Related papers (2020-07-27T16:15:14Z) - Locally Private Hypothesis Selection [96.06118559817057]
We output a distribution from $mathcalQ$ whose total variation distance to $p$ is comparable to the best such distribution.
We show that the constraint of local differential privacy incurs an exponential increase in cost.
Our algorithms result in exponential improvements on the round complexity of previous methods.
arXiv Detail & Related papers (2020-02-21T18:30:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.