Related papers: Differential Privacy for Clustering Under Continual Observation

Differential Privacy for Clustering Under Continual Observation

URL: http://arxiv.org/abs/2307.03430v2
Date: Thu, 27 Jul 2023 14:36:29 GMT
Title: Differential Privacy for Clustering Under Continual Observation
Authors: Max Dupr\'e la Tour, Monika Henzinger, David Saulpic
Abstract summary: We consider the problem of clustering privately a dataset in $mathbbRd$ that undergoes both insertion and deletion of points. We give an $varepsilon$-differentially private clustering mechanism for the $k$-means objective under continual observation.
Score: 5.220940151628734
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider the problem of clustering privately a dataset in $\mathbb{R}^d$ that undergoes both insertion and deletion of points. Specifically, we give an $\varepsilon$-differentially private clustering mechanism for the $k$-means objective under continual observation. This is the first approximation algorithm for that problem with an additive error that depends only logarithmically in the number $T$ of updates. The multiplicative error is almost the same as non privately. To do so we show how to perform dimension reduction under continual observation and combine it with a differentially private greedy approximation algorithm for $k$-means. We also partially extend our results to the $k$-median problem.

Related papers

Near-Optimal Differentially Private k-Core Decomposition [2.859324824091086]
We show that an $eps$-edge differentially private algorithm for $k$-core decomposition outputs the core numbers with no multiplicative error and $O(textlog(n)/eps)$ additive error. This improves upon previous work by a factor of 2 in the multiplicative error, while giving near-optimal additive error.
arXiv Detail & Related papers (2023-12-12T20:09:07Z)
Simple, Scalable and Effective Clustering via One-Dimensional Projections [10.807367640692021]
Clustering is a fundamental problem in unsupervised machine learning with many applications in data analysis. We introduce a simple randomized clustering algorithm that provably runs in expected time $O(mathrmnnz(X) + nlog n)$ for arbitrary $k$. We prove that our algorithm achieves approximation ratio $smashwidetildeO(k4)$ on any input dataset for the $k$-means objective.
arXiv Detail & Related papers (2023-10-25T16:37:45Z)
Differentially Private Clustering in Data Streams [65.78882209673885]
We present a differentially private streaming clustering framework which only requires an offline DP coreset or clustering algorithm as a blackbox. Our framework is also differentially private under the continual release setting, i.e., the union of outputs of our algorithms at every timestamp is always differentially private.
arXiv Detail & Related papers (2023-07-14T16:11:22Z)
Data Structures for Density Estimation [66.36971978162461]
Given a sublinear (in $n$) number of samples from $p$, our main result is the first data structure that identifies $v_i$ in time sublinear in $k$. We also give an improved version of the algorithm of Acharya et al. that reports $v_i$ in time linear in $k$.
arXiv Detail & Related papers (2023-06-20T06:13:56Z)
Differentially-Private Hierarchical Clustering with Provable Approximation Guarantees [79.59010418610625]
We study differentially private approximation algorithms for hierarchical clustering. We show strong lower bounds for the problem: that any $epsilon$-DP algorithm must exhibit $O(|V|2/ epsilon)$-additive error for an input dataset. We propose a private $1+o(1)$ approximation algorithm which also recovers the blocks exactly.
arXiv Detail & Related papers (2023-01-31T19:14:30Z)
Improved Learning-augmented Algorithms for k-means and k-medians Clustering [8.04779839951237]
We consider the problem of clustering in the learning-augmented setting, where we are given a data set in $d$-dimensional Euclidean space. We propose a deterministic $k$-means algorithm that produces centers with improved bound on clustering cost. Our algorithm works even when the predictions are not very accurate, i.e. our bound holds for $alpha$ up to $1/2$, an improvement over $alpha$ being at most $1/7$ in the previous work.
arXiv Detail & Related papers (2022-10-31T03:00:11Z)
Anonymized Histograms in Intermediate Privacy Models [54.32252900997422]
We provide an algorithm with a nearly matching error guarantee of $tildeO_varepsilon(sqrtn)$ in the shuffle DP and pan-private models. Our algorithm is very simple: it just post-processes the discrete Laplace-noised histogram.
arXiv Detail & Related papers (2022-10-27T05:11:00Z)
Scalable Differentially Private Clustering via Hierarchically Separated Trees [82.69664595378869]
We show that our method computes a solution with cost at most $O(d3/2log n)cdot OPT + O(k d2 log2 n / epsilon2)$, where $epsilon$ is the privacy guarantee. Although the worst-case guarantee is worse than that of state of the art private clustering methods, the algorithm we propose is practical.
arXiv Detail & Related papers (2022-06-17T09:24:41Z)
Differentially private $k$-means clustering via exponential mechanism and max cover [6.736814259597673]
We introduce a new $(epsilon_p, delta_p)$-differentially private algorithm for the $k$-means clustering problem.
arXiv Detail & Related papers (2020-09-02T17:52:54Z)
Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems. Our results imply an improved algorithm for the Sample and Aggregate privacy framework. One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.