k-Means SubClustering: A Differentially Private Algorithm with Improved
Clustering Quality
- URL: http://arxiv.org/abs/2301.02896v1
- Date: Sat, 7 Jan 2023 17:07:12 GMT
- Title: k-Means SubClustering: A Differentially Private Algorithm with Improved
Clustering Quality
- Authors: Devvrat Joshi, Janvi Thakkar
- Abstract summary: Many differentially private iterative algorithms have been proposed in interactive settings to protect an individual's privacy from inference attacks.
In this work, we extend the previous work on 'Differentially Private k-Means Clustering With Convergence Guarantee' by taking it as our baseline.
The novelty of our approach is to sub-cluster the clusters and then select the centroid which has a higher probability of moving in the direction of the future centroid.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In today's data-driven world, the sensitivity of information has been a
significant concern. With this data and additional information on the person's
background, one can easily infer an individual's private data. Many
differentially private iterative algorithms have been proposed in interactive
settings to protect an individual's privacy from these inference attacks. The
existing approaches adapt the method to compute differentially private(DP)
centroids by iterative Llyod's algorithm and perturbing the centroid with
various DP mechanisms. These DP mechanisms do not guarantee convergence of
differentially private iterative algorithms and degrade the quality of the
cluster. Thus, in this work, we further extend the previous work on
'Differentially Private k-Means Clustering With Convergence Guarantee' by
taking it as our baseline. The novelty of our approach is to sub-cluster the
clusters and then select the centroid which has a higher probability of moving
in the direction of the future centroid. At every Lloyd's step, the centroids
are injected with the noise using the exponential DP mechanism. The results of
the experiments indicate that our approach outperforms the current
state-of-the-art method, i.e., the baseline algorithm, in terms of clustering
quality while maintaining the same differential privacy requirements. The
clustering quality significantly improved by 4.13 and 2.83 times than baseline
for the Wine and Breast_Cancer dataset, respectively.
Related papers
- Fuzzy K-Means Clustering without Cluster Centroids [21.256564324236333]
Fuzzy K-Means clustering is a critical technique in unsupervised data analysis.
This paper proposes a novel Fuzzy textitK-Means clustering algorithm that entirely eliminates the reliance on cluster centroids.
arXiv Detail & Related papers (2024-04-07T12:25:03Z) - On the privacy of federated Clustering: A Cryptographic View [2.209921757303168]
Many privacy-preserving clustering algorithms leverage cryptographic techniques like homomorphic encryption or secure multiparty computation to guarantee full privacy.
This paper delves into this intricate trade-off, questioning the necessity of continuous encryption in iterative algorithms.
We show that existing lattice-based HSSP attacks fail in reconstructing the private data given the knowledge of intermediate centroids, thus it is secure to reveal them for the sake of efficiency.
arXiv Detail & Related papers (2023-12-13T09:04:14Z) - Privacy-preserving Continual Federated Clustering via Adaptive Resonance
Theory [11.190614418770558]
In the clustering domain, various algorithms with a federated learning framework (i.e., federated clustering) have been actively studied.
This paper proposes a privacy-preserving continual federated clustering algorithm.
Experimental results with synthetic and real-world datasets show that the proposed algorithm has superior clustering performance.
arXiv Detail & Related papers (2023-09-07T05:45:47Z) - Differentially Private Federated Clustering over Non-IID Data [59.611244450530315]
clustering clusters (FedC) problem aims to accurately partition unlabeled data samples distributed over massive clients into finite clients under the orchestration of a server.
We propose a novel FedC algorithm using differential privacy convergence technique, referred to as DP-Fed, in which partial participation and multiple clients are also considered.
Various attributes of the proposed DP-Fed are obtained through theoretical analyses of privacy protection, especially for the case of non-identically and independently distributed (non-i.i.d.) data.
arXiv Detail & Related papers (2023-01-03T05:38:43Z) - Decentralized Stochastic Optimization with Inherent Privacy Protection [103.62463469366557]
Decentralized optimization is the basic building block of modern collaborative machine learning, distributed estimation and control, and large-scale sensing.
Since involved data, privacy protection has become an increasingly pressing need in the implementation of decentralized optimization algorithms.
arXiv Detail & Related papers (2022-05-08T14:38:23Z) - Gradient Based Clustering [72.15857783681658]
We propose a general approach for distance based clustering, using the gradient of the cost function that measures clustering quality.
The approach is an iterative two step procedure (alternating between cluster assignment and cluster center updates) and is applicable to a wide range of functions.
arXiv Detail & Related papers (2022-02-01T19:31:15Z) - Determinantal consensus clustering [77.34726150561087]
We propose the use of determinantal point processes or DPP for the random restart of clustering algorithms.
DPPs favor diversity of the center points within subsets.
We show through simulations that, contrary to DPP, this technique fails both to ensure diversity, and to obtain a good coverage of all data facets.
arXiv Detail & Related papers (2021-02-07T23:48:24Z) - Graph-Homomorphic Perturbations for Private Decentralized Learning [64.26238893241322]
Local exchange of estimates allows inference of data based on private data.
perturbations chosen independently at every agent, resulting in a significant performance loss.
We propose an alternative scheme, which constructs perturbations according to a particular nullspace condition, allowing them to be invisible.
arXiv Detail & Related papers (2020-10-23T10:35:35Z) - Differentially Private Clustering via Maximum Coverage [7.059472280274009]
We study the problem of clustering in metric spaces while preserving the privacy of individual data.
We present differential algorithms with constant multiplicative error and lower additive error.
arXiv Detail & Related papers (2020-08-27T22:11:18Z) - Differentially Private Clustering: Tight Approximation Ratios [57.89473217052714]
We give efficient differentially private algorithms for basic clustering problems.
Our results imply an improved algorithm for the Sample and Aggregate privacy framework.
One of the tools used in our 1-Cluster algorithm can be employed to get a faster quantum algorithm for ClosestPair in a moderate number of dimensions.
arXiv Detail & Related papers (2020-08-18T16:22:06Z) - Differentially Private k-Means Clustering with Guaranteed Convergence [5.335316436366718]
Iterative clustering algorithms help us to learn the insights behind the data.
It may allow adversaries to infer the privacy of individuals with some background knowledge.
To protect individual privacy against such an inference attack, preserving differential privacy (DP) for the iterative clustering algorithms has been extensively studied.
arXiv Detail & Related papers (2020-02-03T22:53:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.