k-Means SubClustering: A Differentially Private Algorithm with Improved
Clustering Quality
- URL: http://arxiv.org/abs/2301.02896v1
- Date: Sat, 7 Jan 2023 17:07:12 GMT
- Title: k-Means SubClustering: A Differentially Private Algorithm with Improved
Clustering Quality
- Authors: Devvrat Joshi, Janvi Thakkar
- Abstract summary: Many differentially private iterative algorithms have been proposed in interactive settings to protect an individual's privacy from inference attacks.
In this work, we extend the previous work on 'Differentially Private k-Means Clustering With Convergence Guarantee' by taking it as our baseline.
The novelty of our approach is to sub-cluster the clusters and then select the centroid which has a higher probability of moving in the direction of the future centroid.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In today's data-driven world, the sensitivity of information has been a
significant concern. With this data and additional information on the person's
background, one can easily infer an individual's private data. Many
differentially private iterative algorithms have been proposed in interactive
settings to protect an individual's privacy from these inference attacks. The
existing approaches adapt the method to compute differentially private(DP)
centroids by iterative Llyod's algorithm and perturbing the centroid with
various DP mechanisms. These DP mechanisms do not guarantee convergence of
differentially private iterative algorithms and degrade the quality of the
cluster. Thus, in this work, we further extend the previous work on
'Differentially Private k-Means Clustering With Convergence Guarantee' by
taking it as our baseline. The novelty of our approach is to sub-cluster the
clusters and then select the centroid which has a higher probability of moving
in the direction of the future centroid. At every Lloyd's step, the centroids
are injected with the noise using the exponential DP mechanism. The results of
the experiments indicate that our approach outperforms the current
state-of-the-art method, i.e., the baseline algorithm, in terms of clustering
quality while maintaining the same differential privacy requirements. The
clustering quality significantly improved by 4.13 and 2.83 times than baseline
for the Wine and Breast_Cancer dataset, respectively.
Related papers
- Linear-Time User-Level DP-SCO via Robust Statistics [55.350093142673316]
User-level differentially private convex optimization (DP-SCO) has garnered significant attention due to the importance of safeguarding user privacy in machine learning applications.
Current methods, such as those based on differentially private gradient descent (DP-SGD), often struggle with high noise accumulation and suboptimal utility.
We introduce a novel linear-time algorithm that leverages robust statistics, specifically the median and trimmed mean, to overcome these challenges.
arXiv Detail & Related papers (2025-02-13T02:05:45Z) - Differentially Private Random Block Coordinate Descent [51.62669821275571]
We propose a differentially private random coordinate descent method that selects multiple coordinates with varying probabilities in each iteration using sketch matrices.
Our algorithm generalizes both DP-CD and the classical DP-SGD (Differentially Private Descent), while preserving the same utility guarantees.
arXiv Detail & Related papers (2024-12-22T15:06:56Z) - Differentially Private Clustered Federated Learning [4.768272342753616]
Federated learning (FL) often incorporates differential privacy (DP) to provide rigorous data privacy guarantees.
Previous works attempted to address high structured data heterogeneity in vanilla FL settings through clustering clients (a.k.a clustered FL)
We propose an algorithm for differentially private clustered FL, which is robust to the DP noise in the system and identifies the underlying clients' clusters correctly.
arXiv Detail & Related papers (2024-05-29T17:03:31Z) - Fuzzy K-Means Clustering without Cluster Centroids [21.256564324236333]
Fuzzy K-Means clustering is a critical technique in unsupervised data analysis.
This paper proposes a novel Fuzzy textitK-Means clustering algorithm that entirely eliminates the reliance on cluster centroids.
arXiv Detail & Related papers (2024-04-07T12:25:03Z) - Differentially Private Federated Clustering over Non-IID Data [59.611244450530315]
clustering clusters (FedC) problem aims to accurately partition unlabeled data samples distributed over massive clients into finite clients under the orchestration of a server.
We propose a novel FedC algorithm using differential privacy convergence technique, referred to as DP-Fed, in which partial participation and multiple clients are also considered.
Various attributes of the proposed DP-Fed are obtained through theoretical analyses of privacy protection, especially for the case of non-identically and independently distributed (non-i.i.d.) data.
arXiv Detail & Related papers (2023-01-03T05:38:43Z) - Decentralized Stochastic Optimization with Inherent Privacy Protection [103.62463469366557]
Decentralized optimization is the basic building block of modern collaborative machine learning, distributed estimation and control, and large-scale sensing.
Since involved data, privacy protection has become an increasingly pressing need in the implementation of decentralized optimization algorithms.
arXiv Detail & Related papers (2022-05-08T14:38:23Z) - Determinantal consensus clustering [77.34726150561087]
We propose the use of determinantal point processes or DPP for the random restart of clustering algorithms.
DPPs favor diversity of the center points within subsets.
We show through simulations that, contrary to DPP, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets.
arXiv Detail & Related papers (2021-02-07T23:48:24Z) - Differentially Private Clustering via Maximum Coverage [7.059472280274009]
We study the problem of clustering in metric spaces while preserving the privacy of individual data.
We present differential algorithms with constant multiplicative error and lower additive error.
arXiv Detail & Related papers (2020-08-27T22:11:18Z) - Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems.
Our results imply an improved algorithm for the Sample and Aggregate privacy framework.
One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z) - Differentially Private k-Means Clustering with Guaranteed Convergence [5.335316436366718]
Iterative clustering algorithms help us to learn the insights behind the data.
It may allow adversaries to infer the privacy of individuals with some background knowledge.
To protect individual privacy against such an inference attack, preserving differential privacy (DP) for the iterative clustering algorithms has been extensively studied.
arXiv Detail & Related papers (2020-02-03T22:53:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.