Differentially Private Explanations for Clusters
- URL: http://arxiv.org/abs/2506.05900v1
- Date: Fri, 06 Jun 2025 09:14:45 GMT
- Title: Differentially Private Explanations for Clusters
- Authors: Amir Gilad, Tova Milo, Kathy Razmadze, Ron Zadicario,
- Abstract summary: We present DPClustX, a framework that provides explanations for black-box clustering results while satisfying Differential privacy.<n>We perform an experimental analysis of DPClustX on real data, showing that it provides insightful and accurate explanations even under tight privacy constraints.
- Score: 16.217435153209752
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The dire need to protect sensitive data has led to various flavors of privacy definitions. Among these, Differential privacy (DP) is considered one of the most rigorous and secure notions of privacy, enabling data analysis while preserving the privacy of data contributors. One of the fundamental tasks of data analysis is clustering , which is meant to unravel hidden patterns within complex datasets. However, interpreting clustering results poses significant challenges, and often necessitates an extensive analytical process. Interpreting clustering results under DP is even more challenging, as analysts are provided with noisy responses to queries, and longer, manual exploration sessions require additional noise to meet privacy constraints. While increasing attention has been given to clustering explanation frameworks that aim at assisting analysts by automatically uncovering the characteristics of each cluster, such frameworks may also disclose sensitive information within the dataset, leading to a breach in privacy. To address these challenges, we present DPClustX, a framework that provides explanations for black-box clustering results while satisfying DP. DPClustX takes as input the sensitive dataset alongside privately computed clustering labels, and outputs a global explanation, emphasizing prominent characteristics of each cluster while guaranteeing DP. We perform an extensive experimental analysis of DPClustX on real data, showing that it provides insightful and accurate explanations even under tight privacy constraints.
Related papers
- Optimal Allocation of Privacy Budget on Hierarchical Data Release [48.96399034594329]
This paper addresses the problem of optimal privacy budget allocation for hierarchical data release.<n>It aims to maximize data utility subject to a total privacy budget while considering the inherent trade-offs between data granularity and privacy loss.
arXiv Detail & Related papers (2025-05-16T05:25:11Z) - Private Approximate Query over Horizontal Data Federation [0.0]
Existing approaches rely on cryptography, which improves privacy, but at the expense of query response time.
We propose a new approach that considers a data distribution-aware online sampling technique to accelerate the execution of range queries.
Our solution is able of providing up to 8 times faster processing than the basic non-secure solution.
arXiv Detail & Related papers (2024-06-17T11:19:58Z) - Initialization Matters: Privacy-Utility Analysis of Overparameterized
Neural Networks [72.51255282371805]
We prove a privacy bound for the KL divergence between model distributions on worst-case neighboring datasets.
We find that this KL privacy bound is largely determined by the expected squared gradient norm relative to model parameters during training.
arXiv Detail & Related papers (2023-10-31T16:13:22Z) - DPM: Clustering Sensitive Data through Separation [2.2179058122448922]
We present a privacy-preserving clustering algorithm called DPM that separates a data set into clusters based on a geometrical clustering approach.
We show that DPM achieves state-of-the-art utility on the standard clustering metrics and yields a clustering result much closer to that of the popular non-private KMeans algorithm.
arXiv Detail & Related papers (2023-07-06T13:12:19Z) - On Differential Privacy and Adaptive Data Analysis with Bounded Space [76.10334958368618]
We study the space complexity of the two related fields of differential privacy and adaptive data analysis.
We show that there exists a problem P that requires exponentially more space to be solved efficiently with differential privacy.
The line of work on adaptive data analysis focuses on understanding the number of samples needed for answering a sequence of adaptive queries.
arXiv Detail & Related papers (2023-02-11T14:45:31Z) - Differentially Private Federated Clustering over Non-IID Data [59.611244450530315]
clustering clusters (FedC) problem aims to accurately partition unlabeled data samples distributed over massive clients into finite clients under the orchestration of a server.
We propose a novel FedC algorithm using differential privacy convergence technique, referred to as DP-Fed, in which partial participation and multiple clients are also considered.
Various attributes of the proposed DP-Fed are obtained through theoretical analyses of privacy protection, especially for the case of non-identically and independently distributed (non-i.i.d.) data.
arXiv Detail & Related papers (2023-01-03T05:38:43Z) - Differentially-Private Clustering of Easy Instances [67.04951703461657]
In differentially private clustering, the goal is to identify $k$ cluster centers without disclosing information on individual data points.
We provide implementable differentially private clustering algorithms that provide utility when the data is "easy"
We propose a framework that allows us to apply non-private clustering algorithms to the easy instances and privately combine the results.
arXiv Detail & Related papers (2021-12-29T08:13:56Z) - Graph-Homomorphic Perturbations for Private Decentralized Learning [64.26238893241322]
Local exchange of estimates allows inference of data based on private data.
perturbations chosen independently at every agent, resulting in a significant performance loss.
We propose an alternative scheme, which constructs perturbations according to a particular nullspace condition, allowing them to be invisible.
arXiv Detail & Related papers (2020-10-23T10:35:35Z) - imdpGAN: Generating Private and Specific Data with Generative
Adversarial Networks [19.377726080729293]
imdpGAN is an end-to-end framework that simultaneously achieves privacy protection and learns latent representations.
We show that imdpGAN preserves the privacy of the individual data point, and learns latent codes to control the specificity of the generated samples.
arXiv Detail & Related papers (2020-09-29T08:03:32Z) - Attribute Privacy: Framework and Mechanisms [26.233612860653025]
We study the study of attribute privacy, where a data owner is concerned about revealing sensitive properties of a whole dataset during analysis.
We propose definitions to capture emphattribute privacy in two relevant cases where global attributes may need to be protected.
We provide two efficient mechanisms and one inefficient mechanism that satisfy attribute privacy for these settings.
arXiv Detail & Related papers (2020-09-08T22:38:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.