Scalable Differentially Private Clustering via Hierarchically Separated
Trees
- URL: http://arxiv.org/abs/2206.08646v1
- Date: Fri, 17 Jun 2022 09:24:41 GMT
- Title: Scalable Differentially Private Clustering via Hierarchically Separated
Trees
- Authors: Vincent Cohen-Addad, Alessandro Epasto, Silvio Lattanzi, Vahab
Mirrokni, Andres Munoz, David Saulpic, Chris Schwiegelshohn, Sergei
Vassilvitskii
- Abstract summary: We show that our method computes a solution with cost at most $O(d3/2log n)cdot OPT + O(k d2 log2 n / epsilon2)$, where $epsilon$ is the privacy guarantee.
Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical.
- Score: 82.69664595378869
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the private $k$-median and $k$-means clustering problem in $d$
dimensional Euclidean space. By leveraging tree embeddings, we give an
efficient and easy to implement algorithm, that is empirically competitive with
state of the art non private methods. We prove that our method computes a
solution with cost at most $O(d^{3/2}\log n)\cdot OPT + O(k d^2 \log^2 n /
\epsilon^2)$, where $\epsilon$ is the privacy guarantee. (The dimension term,
$d$, can be replaced with $O(\log k)$ using standard dimension reduction
techniques.) Although the worst-case guarantee is worse than that of state of
the art private clustering methods, the algorithm we propose is practical, runs
in near-linear, $\tilde{O}(nkd)$, time and scales to tens of millions of
points. We also show that our method is amenable to parallelization in
large-scale distributed computing environments. In particular we show that our
private algorithms can be implemented in logarithmic number of MPC rounds in
the sublinear memory regime. Finally, we complement our theoretical analysis
with an empirical evaluation demonstrating the algorithm's efficiency and
accuracy in comparison to other privacy clustering baselines.
Related papers
- Private Geometric Median [10.359525525715421]
We study differentially private (DP) algorithms for computing the geometric median (GM) of a dataset.
Our main contribution is a pair of DP algorithms for the task of private GM with an excess error guarantee that scales with the effective diameter of the datapoints.
arXiv Detail & Related papers (2024-06-11T16:13:09Z) - Perturb-and-Project: Differentially Private Similarities and Marginals [73.98880839337873]
We revisit the input perturbations framework for differential privacy where noise is added to the input $Ain mathcalS$.
We first design novel efficient algorithms to privately release pair-wise cosine similarities.
We derive a novel algorithm to compute $k$-way marginal queries over $n$ features.
arXiv Detail & Related papers (2024-06-07T12:07:16Z) - A Scalable Algorithm for Individually Fair K-means Clustering [77.93955971520549]
We present a scalable algorithm for the individually fair ($p$, $k$)-clustering problem introduced by Jung et al. and Mahabadi et al.
A clustering is then called individually fair if it has centers within distance $delta(x)$ of $x$ for each $xin P$.
We show empirically that not only is our algorithm much faster than prior work, but it also produces lower-cost solutions.
arXiv Detail & Related papers (2024-02-09T19:01:48Z) - Differentially-Private Hierarchical Clustering with Provable
Approximation Guarantees [79.59010418610625]
We study differentially private approximation algorithms for hierarchical clustering.
We show strong lower bounds for the problem: that any $epsilon$-DP algorithm must exhibit $O(|V|2/ epsilon)$-additive error for an input dataset.
We propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly.
arXiv Detail & Related papers (2023-01-31T19:14:30Z) - Private estimation algorithms for stochastic block models and mixture
models [63.07482515700984]
General tools for designing efficient private estimation algorithms.
First efficient $(epsilon, delta)$-differentially private algorithm for both weak recovery and exact recovery.
arXiv Detail & Related papers (2023-01-11T09:12:28Z) - Systematically improving existing k-means initialization algorithms at
nearly no cost, by pairwise-nearest-neighbor smoothing [1.2570180539670577]
We present a meta-method for initializing the $k$-means clustering algorithm called PNN-smoothing.
It consists in splitting a given dataset into $J$ random subsets, clustering each of them individually, and merging the resulting clusterings with the pairwise-nearest-neighbor method.
arXiv Detail & Related papers (2022-02-08T15:56:30Z) - Learning with User-Level Privacy [61.62978104304273]
We analyze algorithms to solve a range of learning tasks under user-level differential privacy constraints.
Rather than guaranteeing only the privacy of individual samples, user-level DP protects a user's entire contribution.
We derive an algorithm that privately answers a sequence of $K$ adaptively chosen queries with privacy cost proportional to $tau$, and apply it to solve the learning tasks we consider.
arXiv Detail & Related papers (2021-02-23T18:25:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.