DPM: Clustering Sensitive Data through Separation
- URL: http://arxiv.org/abs/2307.02969v3
- Date: Tue, 20 Aug 2024 08:46:40 GMT
- Title: DPM: Clustering Sensitive Data through Separation
- Authors: Johannes Liebenow, Yara Schütt, Tanya Braun, Marcel Gehrke, Florian Thaeter, Esfandiar Mohammadi,
- Abstract summary: We present a privacy-preserving clustering algorithm called DPM that separates a data set into clusters based on a geometrical clustering approach.
We show that DPM achieves state-of-the-art utility on the standard clustering metrics and yields a clustering result much closer to that of the popular non-private KMeans algorithm.
- Score: 2.2179058122448922
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Clustering is an important tool for data exploration where the goal is to subdivide a data set into disjoint clusters that fit well into the underlying data structure. When dealing with sensitive data, privacy-preserving algorithms aim to approximate the non-private baseline while minimising the leakage of sensitive information. State-of-the-art privacy-preserving clustering algorithms tend to output clusters that are good in terms of the standard metrics, inertia, silhouette score, and clustering accuracy, however, the clustering result strongly deviates from the non-private KMeans baseline. In this work, we present a privacy-preserving clustering algorithm called DPM that recursively separates a data set into clusters based on a geometrical clustering approach. In addition, DPM estimates most of the data-dependent hyper-parameters in a privacy-preserving way. We prove that DPM preserves Differential Privacy and analyse the utility guarantees of DPM. Finally, we conduct an extensive empirical evaluation for synthetic and real-life data sets. We show that DPM achieves state-of-the-art utility on the standard clustering metrics and yields a clustering result much closer to that of the popular non-private KMeans algorithm without requiring the number of classes.
Related papers
- A new type of federated clustering: A non-model-sharing approach [6.5347053452025206]
This study proposes data collaboration clustering (DC-Clustering)<n>DC-Clustering supports clustering over complex data partitioning scenarios where horizontal and vertical splits coexist.<n>Results show that our method achieves clustering performance comparable to centralized clustering where all data are pooled.
arXiv Detail & Related papers (2025-06-11T23:57:26Z) - Differentially Private Federated $k$-Means Clustering with Server-Side Data [19.962475029447127]
FedDP-KMeans is an algorithm for $k$-means clustering that is fully-federated as well as differentially private.<n>Our algorithm achieves excellent results on synthetic and real-world benchmark tasks.
arXiv Detail & Related papers (2025-06-04T14:53:25Z) - K*-Means: A Parameter-free Clustering Algorithm [55.20132267309382]
k*-means is a novel clustering algorithm that eliminates the need to set k or any other parameters.<n>It uses the minimum description length principle to automatically determine the optimal number of clusters, k*, by splitting and merging clusters.<n>We prove that k*-means is guaranteed to converge and demonstrate experimentally that it significantly outperforms existing methods in scenarios where k is unknown.
arXiv Detail & Related papers (2025-05-17T08:41:07Z) - Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning [53.527506374566485]
We propose a novel Adaptive and Robust DBSCAN with Multi-agent Reinforcement Learning cluster framework, namely AR-DBSCAN.<n>We show that AR-DBSCAN not only improves clustering accuracy by up to 144.1% and 175.3% in the NMI and ARI metrics, respectively, but also is capable of robustly finding dominant parameters.
arXiv Detail & Related papers (2025-05-07T11:37:23Z) - Order is All You Need for Categorical Data Clustering [31.851890008893847]
This paper introduces a new finding that the order relation among attribute values is the decisive factor in clustering accuracy.
We propose a new learning paradigm that allows joint learning of clusters and the orders.
The algorithm achieves superior clustering accuracy with a convergence guarantee.
arXiv Detail & Related papers (2024-11-19T08:23:25Z) - Clustering Based on Density Propagation and Subcluster Merging [92.15924057172195]
We propose a density-based node clustering approach that automatically determines the number of clusters and can be applied in both data space and graph space.
Unlike traditional density-based clustering methods, which necessitate calculating the distance between any two nodes, our proposed technique determines density through a propagation process.
arXiv Detail & Related papers (2024-11-04T04:09:36Z) - Self-Supervised Graph Embedding Clustering [70.36328717683297]
K-means one-step dimensionality reduction clustering method has made some progress in addressing the curse of dimensionality in clustering tasks.
We propose a unified framework that integrates manifold learning with K-means, resulting in the self-supervised graph embedding framework.
arXiv Detail & Related papers (2024-09-24T08:59:51Z) - Differentially Private Clustered Federated Learning [4.768272342753616]
Federated learning (FL) often incorporates differential privacy (DP) to provide rigorous data privacy guarantees.
Previous works attempted to address high structured data heterogeneity in vanilla FL settings through clustering clients (a.k.a clustered FL)
We propose an algorithm for differentially private clustered FL, which is robust to the DP noise in the system and identifies the underlying clients' clusters correctly.
arXiv Detail & Related papers (2024-05-29T17:03:31Z) - AugDMC: Data Augmentation Guided Deep Multiple Clustering [2.479720095773358]
AugDMC is a novel data Augmentation guided Deep Multiple Clustering method.
It exploits data augmentations to automatically extract features related to a certain aspect of the data.
A stable optimization strategy is proposed to alleviate the unstable problem from different augmentations.
arXiv Detail & Related papers (2023-06-22T16:31:46Z) - DMS: Differentiable Mean Shift for Dataset Agnostic Task Specific
Clustering Using Side Information [0.0]
We present a novel approach, in which we learn to cluster data directly from side information.
We do not need to know the number of clusters, their centers or any kind of distance metric for similarity.
Our method is able to divide the same data points in various ways dependant on the needs of a specific task.
arXiv Detail & Related papers (2023-05-29T13:45:49Z) - ClusterNet: A Perception-Based Clustering Model for Scattered Data [16.326062082938215]
Cluster separation in scatterplots is a task that is typically tackled by widely used clustering techniques.
We propose a learning strategy which directly operates on scattered data.
We train ClusterNet, a point-based deep learning model, trained to reflect human perception of cluster separability.
arXiv Detail & Related papers (2023-04-27T13:41:12Z) - CADIS: Handling Cluster-skewed Non-IID Data in Federated Learning with
Clustered Aggregation and Knowledge DIStilled Regularization [3.3711670942444014]
Federated learning enables edge devices to train a global model collaboratively without exposing their data.
We tackle a new type of Non-IID data, called cluster-skewed non-IID, discovered in actual data sets.
We propose an aggregation scheme that guarantees equality between clusters.
arXiv Detail & Related papers (2023-02-21T02:53:37Z) - A Prototype-Oriented Clustering for Domain Shift with Source Privacy [66.67700676888629]
We introduce Prototype-oriented Clustering with Distillation (PCD) to improve the performance and applicability of existing methods.
PCD first constructs a source clustering model by aligning the distributions of prototypes and data.
It then distills the knowledge to the target model through cluster labels provided by the source model while simultaneously clustering the target data.
arXiv Detail & Related papers (2023-02-08T00:15:35Z) - Privacy-Preserving Federated Deep Clustering based on GAN [12.256298398007848]
We present a novel approach to Federated Deep Clustering based on Generative Adversarial Networks (GANs)
Each client trains a local generative adversarial network (GAN) locally and uploads the synthetic data to the server.
The server applies a deep clustering network on the synthetic data to establish $k$ cluster centroids, which are then downloaded to the clients for cluster assignment.
arXiv Detail & Related papers (2022-11-30T13:20:11Z) - Differentially-Private Clustering of Easy Instances [67.04951703461657]
In differentially private clustering, the goal is to identify $k$ cluster centers without disclosing information on individual data points.
We provide implementable differentially private clustering algorithms that provide utility when the data is "easy"
We propose a framework that allows us to apply non-private clustering algorithms to the easy instances and privately combine the results.
arXiv Detail & Related papers (2021-12-29T08:13:56Z) - Robust Trimmed k-means [70.88503833248159]
We propose Robust Trimmed k-means (RTKM) that simultaneously identifies outliers and clusters points.
We show RTKM performs competitively with other methods on single membership data with outliers and multi-membership data without outliers.
arXiv Detail & Related papers (2021-08-16T15:49:40Z) - Scalable Hierarchical Agglomerative Clustering [65.66407726145619]
Existing scalable hierarchical clustering methods sacrifice quality for speed.
We present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points.
arXiv Detail & Related papers (2020-10-22T15:58:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.