Differentially Private Explanations for Clusters
- URL: http://arxiv.org/abs/2506.05900v1
- Date: Fri, 06 Jun 2025 09:14:45 GMT
- Title: Differentially Private Explanations for Clusters
- Authors: Amir Gilad, Tova Milo, Kathy Razmadze, Ron Zadicario,
- Abstract summary: We present DPClustX, a framework that provides explanations for black-box clustering results while satisfying Differential privacy.<n>We perform an experimental analysis of DPClustX on real data, showing that it provides insightful and accurate explanations even under tight privacy constraints.
- Score: 16.217435153209752
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The dire need to protect sensitive data has led to various flavors of privacy definitions. Among these, Differential privacy (DP) is considered one of the most rigorous and secure notions of privacy, enabling data analysis while preserving the privacy of data contributors. One of the fundamental tasks of data analysis is clustering , which is meant to unravel hidden patterns within complex datasets. However, interpreting clustering results poses significant challenges, and often necessitates an extensive analytical process. Interpreting clustering results under DP is even more challenging, as analysts are provided with noisy responses to queries, and longer, manual exploration sessions require additional noise to meet privacy constraints. While increasing attention has been given to clustering explanation frameworks that aim at assisting analysts by automatically uncovering the characteristics of each cluster, such frameworks may also disclose sensitive information within the dataset, leading to a breach in privacy. To address these challenges, we present DPClustX, a framework that provides explanations for black-box clustering results while satisfying DP. DPClustX takes as input the sensitive dataset alongside privately computed clustering labels, and outputs a global explanation, emphasizing prominent characteristics of each cluster while guaranteeing DP. We perform an extensive experimental analysis of DPClustX on real data, showing that it provides insightful and accurate explanations even under tight privacy constraints.
Related papers
- Towards Federated Clustering: A Client-wise Private Graph Aggregation Framework [57.04850867402913]
Federated clustering addresses the challenge of extracting patterns from decentralized, unlabeled data.<n>We propose Structural Privacy-Preserving Federated Graph Clustering (SPP-FGC), a novel algorithm that innovatively leverages local structural graphs as the primary medium for privacy-preserving knowledge sharing.<n>Our framework achieves state-of-the-art performance, improving clustering accuracy by up to 10% (NMI) over federated baselines while maintaining provable privacy guarantees.
arXiv Detail & Related papers (2025-11-14T03:05:22Z) - Subgraph Federated Learning via Spectral Methods [52.40322201034717]
FedLap is a novel framework that captures inter-node dependencies while ensuring privacy and scalability.<n>We provide a formal analysis of the privacy of FedLap, demonstrating that it preserves privacy.
arXiv Detail & Related papers (2025-10-29T16:22:32Z) - SPATA: Systematic Pattern Analysis for Detailed and Transparent Data Cards [4.332839547082766]
This paper presents Systematic Pattern Analysis (SPATA) to provide more detailed and transparent data cards.<n>SPATA computes the projection of each data instance into a discrete space where they can be analyzed and compared, without risking data leakage.
arXiv Detail & Related papers (2025-09-30T17:59:45Z) - SynBench: A Benchmark for Differentially Private Text Generation [35.908455649647784]
Data-driven decision support in high-stakes domains like healthcare and finance faces significant barriers to data sharing.<n>Recent generative AI models, such as large language models, have shown impressive performance in open-domain tasks.<n>But their adoption in sensitive environments remains limited by unpredictable behaviors and insufficient privacy-preserving datasets.
arXiv Detail & Related papers (2025-09-18T03:57:50Z) - Metric Embedding Initialization-Based Differentially Private and Explainable Graph Clustering [0.0]
Graph clustering aims to process graph-structured data while protecting individual privacy.<n>We construct a differentially private and interpretable graph clustering approach based on metric embedding.<n>Our proposed framework outperforms existing methods in various clustering metrics while strictly ensuring privacy.
arXiv Detail & Related papers (2025-09-07T21:28:23Z) - Optimal Allocation of Privacy Budget on Hierarchical Data Release [48.96399034594329]
This paper addresses the problem of optimal privacy budget allocation for hierarchical data release.<n>It aims to maximize data utility subject to a total privacy budget while considering the inherent trade-offs between data granularity and privacy loss.
arXiv Detail & Related papers (2025-05-16T05:25:11Z) - Private Approximate Query over Horizontal Data Federation [0.0]
Existing approaches rely on cryptography, which improves privacy, but at the expense of query response time.
We propose a new approach that considers a data distribution-aware online sampling technique to accelerate the execution of range queries.
Our solution is able of providing up to 8 times faster processing than the basic non-secure solution.
arXiv Detail & Related papers (2024-06-17T11:19:58Z) - Initialization Matters: Privacy-Utility Analysis of Overparameterized
Neural Networks [72.51255282371805]
We prove a privacy bound for the KL divergence between model distributions on worst-case neighboring datasets.
We find that this KL privacy bound is largely determined by the expected squared gradient norm relative to model parameters during training.
arXiv Detail & Related papers (2023-10-31T16:13:22Z) - DPM: Clustering Sensitive Data through Separation [2.2179058122448922]
We present a privacy-preserving clustering algorithm called DPM that separates a data set into clusters based on a geometrical clustering approach.
We show that DPM achieves state-of-the-art utility on the standard clustering metrics and yields a clustering result much closer to that of the popular non-private KMeans algorithm.
arXiv Detail & Related papers (2023-07-06T13:12:19Z) - On Differential Privacy and Adaptive Data Analysis with Bounded Space [76.10334958368618]
We study the space complexity of the two related fields of differential privacy and adaptive data analysis.
We show that there exists a problem P that requires exponentially more space to be solved efficiently with differential privacy.
The line of work on adaptive data analysis focuses on understanding the number of samples needed for answering a sequence of adaptive queries.
arXiv Detail & Related papers (2023-02-11T14:45:31Z) - Differentially Private Federated Clustering over Non-IID Data [59.611244450530315]
clustering clusters (FedC) problem aims to accurately partition unlabeled data samples distributed over massive clients into finite clients under the orchestration of a server.
We propose a novel FedC algorithm using differential privacy convergence technique, referred to as DP-Fed, in which partial participation and multiple clients are also considered.
Various attributes of the proposed DP-Fed are obtained through theoretical analyses of privacy protection, especially for the case of non-identically and independently distributed (non-i.i.d.) data.
arXiv Detail & Related papers (2023-01-03T05:38:43Z) - Differentially-Private Clustering of Easy Instances [67.04951703461657]
In differentially private clustering, the goal is to identify $k$ cluster centers without disclosing information on individual data points.
We provide implementable differentially private clustering algorithms that provide utility when the data is "easy"
We propose a framework that allows us to apply non-private clustering algorithms to the easy instances and privately combine the results.
arXiv Detail & Related papers (2021-12-29T08:13:56Z) - Graph-Homomorphic Perturbations for Private Decentralized Learning [64.26238893241322]
Local exchange of estimates allows inference of data based on private data.
perturbations chosen independently at every agent, resulting in a significant performance loss.
We propose an alternative scheme, which constructs perturbations according to a particular nullspace condition, allowing them to be invisible.
arXiv Detail & Related papers (2020-10-23T10:35:35Z) - imdpGAN: Generating Private and Specific Data with Generative
Adversarial Networks [19.377726080729293]
imdpGAN is an end-to-end framework that simultaneously achieves privacy protection and learns latent representations.
We show that imdpGAN preserves the privacy of the individual data point, and learns latent codes to control the specificity of the generated samples.
arXiv Detail & Related papers (2020-09-29T08:03:32Z) - Attribute Privacy: Framework and Mechanisms [26.233612860653025]
We study the study of attribute privacy, where a data owner is concerned about revealing sensitive properties of a whole dataset during analysis.
We propose definitions to capture emphattribute privacy in two relevant cases where global attributes may need to be protected.
We provide two efficient mechanisms and one inefficient mechanism that satisfy attribute privacy for these settings.
arXiv Detail & Related papers (2020-09-08T22:38:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.