Differential Privacy for Clustering Under Continual Observation
- URL: http://arxiv.org/abs/2307.03430v2
- Date: Thu, 27 Jul 2023 14:36:29 GMT
- Title: Differential Privacy for Clustering Under Continual Observation
- Authors: Max Dupr\'e la Tour, Monika Henzinger, David Saulpic
- Abstract summary: We consider the problem of clustering privately a dataset in $mathbbRd$ that undergoes both insertion and deletion of points.
We give an $varepsilon$-differentially private clustering mechanism for the $k$-means objective under continual observation.
- Score: 5.220940151628734
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of clustering privately a dataset in $\mathbb{R}^d$
that undergoes both insertion and deletion of points. Specifically, we give an
$\varepsilon$-differentially private clustering mechanism for the $k$-means
objective under continual observation. This is the first approximation
algorithm for that problem with an additive error that depends only
logarithmically in the number $T$ of updates. The multiplicative error is
almost the same as non privately. To do so we show how to perform dimension
reduction under continual observation and combine it with a differentially
private greedy approximation algorithm for $k$-means. We also partially extend
our results to the $k$-median problem.
Related papers
- Near-Optimal Differentially Private k-Core Decomposition [2.859324824091086]
We show that an $eps$-edge differentially private algorithm for $k$-core decomposition outputs the core numbers with no multiplicative error and $O(textlog(n)/eps)$ additive error.
This improves upon previous work by a factor of 2 in the multiplicative error, while giving near-optimal additive error.
arXiv Detail & Related papers (2023-12-12T20:09:07Z) - Simple, Scalable and Effective Clustering via One-Dimensional
Projections [10.807367640692021]
Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis.
We introduce a simple randomized clustering algorithm that provably runs in expected time $O(mathrmnnz(X) + nlog n)$ for arbitrary $k$.
We prove that our algorithm achieves approximation ratio $smashwidetildeO(k4)$ on any input dataset for the $k$-means objective.
arXiv Detail & Related papers (2023-10-25T16:37:45Z) - Differentially Private Clustering in Data Streams [65.78882209673885]
We present a differentially private streaming clustering framework which only requires an offline DP coreset or clustering algorithm as a blackbox.
Our framework is also differentially private under the continual release setting, i.e., the union of outputs of our algorithms at every timestamp is always differentially private.
arXiv Detail & Related papers (2023-07-14T16:11:22Z) - Data Structures for Density Estimation [66.36971978162461]
Given a sublinear (in $n$) number of samples from $p$, our main result is the first data structure that identifies $v_i$ in time sublinear in $k$.
We also give an improved version of the algorithm of Acharya et al. that reports $v_i$ in time linear in $k$.
arXiv Detail & Related papers (2023-06-20T06:13:56Z) - Differentially-Private Hierarchical Clustering with Provable
Approximation Guarantees [79.59010418610625]
We study differentially private approximation algorithms for hierarchical clustering.
We show strong lower bounds for the problem: that any $epsilon$-DP algorithm must exhibit $O(|V|2/ epsilon)$-additive error for an input dataset.
We propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly.
arXiv Detail & Related papers (2023-01-31T19:14:30Z) - Improved Learning-augmented Algorithms for k-means and k-medians
Clustering [8.04779839951237]
We consider the problem of clustering in the learning-augmented setting, where we are given a data set in $d$-dimensional Euclidean space.
We propose a deterministic $k$-means algorithm that produces centers with improved bound on clustering cost.
Our algorithm works even when the predictions are not very accurate, i.e. our bound holds for $alpha$ up to $1/2$, an improvement over $alpha$ being at most $1/7$ in the previous work.
arXiv Detail & Related papers (2022-10-31T03:00:11Z) - Anonymized Histograms in Intermediate Privacy Models [54.32252900997422]
We provide an algorithm with a nearly matching error guarantee of $tildeO_varepsilon(sqrtn)$ in the shuffle DP and pan-private models.
Our algorithm is very simple: it just post-processes the discrete Laplace-noised histogram.
arXiv Detail & Related papers (2022-10-27T05:11:00Z) - Scalable Differentially Private Clustering via Hierarchically Separated
Trees [82.69664595378869]
We show that our method computes a solution with cost at most $O(d3/2log n)cdot OPT + O(k d2 log2 n / epsilon2)$, where $epsilon$ is the privacy guarantee.
Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical.
arXiv Detail & Related papers (2022-06-17T09:24:41Z) - Differentially private $k$-means clustering via exponential mechanism
and max cover [6.736814259597673]
We introduce a new $(epsilon_p, delta_p)$-differentially private algorithm for the $k$-means clustering problem.
arXiv Detail & Related papers (2020-09-02T17:52:54Z) - Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems.
Our results imply an improved algorithm for the Sample and Aggregate privacy framework.
One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.