Related papers: CanaryBench: Stress Testing Privacy Leakage in Cluster-Level Conversation Summaries

CanaryBench: Stress Testing Privacy Leakage in Cluster-Level Conversation Summaries

URL: http://arxiv.org/abs/2601.18834v1
Date: Sun, 25 Jan 2026 20:30:37 GMT
Title: CanaryBench: Stress Testing Privacy Leakage in Cluster-Level Conversation Summaries
Authors: Deep Mehta,
Abstract summary: We introduce CanaryBench, a simple and reproducible stress test for privacy leakage in cluster-level conversation summaries.<n>CanaryBench generates synthetic conversations with planted secret strings ("canaries") that simulate sensitive identifiers.<n>We observe canary leakage in 50 of 52 canary-containing clusters, along with nonzero-based PII indicator counts.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Aggregate analytics over conversational data are increasingly used for safety monitoring, governance, and product analysis in large language model systems. A common practice is to embed conversations, cluster them, and publish short textual summaries describing each cluster. While raw conversations may never be exposed, these derived summaries can still pose privacy risks if they contain personally identifying information (PII) or uniquely traceable strings copied from individual conversations. We introduce CanaryBench, a simple and reproducible stress test for privacy leakage in cluster-level conversation summaries. CanaryBench generates synthetic conversations with planted secret strings ("canaries") that simulate sensitive identifiers. Because canaries are known a priori, any appearance of these strings in published summaries constitutes a measurable leak. Using TF-IDF embeddings and k-means clustering on 3,000 synthetic conversations (24 topics) with a canary injection rate of 0.60, we evaluate an intentionally extractive example snippet summarizer that models quote-like reporting. In this configuration, we observe canary leakage in 50 of 52 canary-containing clusters (cluster-level leakage rate 0.961538), along with nonzero regex-based PII indicator counts. A minimal defense combining a minimum cluster-size publication threshold (k-min = 25) and regex-based redaction eliminates measured canary leakage and PII indicator hits in the reported run while maintaining a similar cluster-coherence proxy. We position this work as a societal impacts contribution centered on privacy risk measurement for published analytics artifacts rather than raw user data.

Related papers

VocSim: A Training-free Benchmark for Zero-shot Content Identity in Single-source Audio [1.0791267046450075]
VocSim is a training-free benchmark probing the intrinsic geometric alignment of frozen embeddings.<n>VocSim aggregates 125k single-source clips from 19 corpora spanning human speech, animal vocalizations, and environmental sounds.
arXiv Detail & Related papers (2025-12-10T22:13:12Z)
Synth-MIA: A Testbed for Auditing Privacy Leakage in Tabular Data Synthesis [8.4361320391543]
Tabular Generative Models are often argued to preserve privacy by creating synthetic datasets that resemble training data.<n>Membership Inference Attacks (MIAs) have recently emerged as a method for evaluating privacy leakage in synthetic data.<n>We propose a unified, model-agnostic threat framework that deploys a collection of attacks to estimate the maximum empirical privacy leakage in synthetic datasets.
arXiv Detail & Related papers (2025-09-22T16:53:38Z)
Urania: Differentially Private Insights into AI Use [102.27238986985698]
$Urania$ provides end-to-end privacy protection by leveraging DP tools such as clustering, partition selection, and histogram-based summarization.<n>Results show the framework's ability to extract meaningful conversational insights while maintaining stringent user privacy.
arXiv Detail & Related papers (2025-06-05T07:00:31Z)
STAMP Your Content: Proving Dataset Membership via Watermarked Rephrasings [17.175065729425825]
STAMP is a framework for detecting dataset membership.<n>We show that our framework can successfully detect contamination across four benchmarks which appear only once in the training data.
arXiv Detail & Related papers (2025-04-18T02:25:08Z)
CBW: Towards Dataset Ownership Verification for Speaker Verification via Clustering-based Backdoor Watermarking [85.68235482145091]
Large-scale speech datasets have become valuable intellectual property.<n>We propose a novel dataset ownership verification method.<n>Our approach introduces a clustering-based backdoor watermark (CBW)<n>We conduct extensive experiments on benchmark datasets, verifying the effectiveness and robustness of our method against potential adaptive attacks.
arXiv Detail & Related papers (2025-03-02T02:02:57Z)
Intelligent Multi-Document Summarisation for Extracting Insights on Racial Inequalities from Maternity Incident Investigation Reports [0.8609957371651683]
In healthcare, thousands of safety incidents occur every year, but learning from these incidents is not effectively aggregated. This paper presents I-SIRch:CS, a framework designed to facilitate the aggregation and analysis of safety incident reports. The framework integrates concept annotation using the Safety Intelligence Research (SIRch) taxonomy with clustering, summarisation, and analysis capabilities.
arXiv Detail & Related papers (2024-07-11T09:11:20Z)
Understanding and Mitigating Spurious Correlations in Text Classification with Neighborhood Analysis [69.07674653828565]
Machine learning models have a tendency to leverage spurious correlations that exist in the training set but may not hold true in general circumstances. In this paper, we examine the implications of spurious correlations through a novel perspective called neighborhood analysis. We propose a family of regularization methods, NFL (doN't Forget your Language) to mitigate spurious correlations in text classification.
arXiv Detail & Related papers (2023-05-23T03:55:50Z)
SWING: Balancing Coverage and Faithfulness for Dialogue Summarization [67.76393867114923]
We propose to utilize natural language inference (NLI) models to improve coverage while avoiding factual inconsistencies. We use NLI to compute fine-grained training signals to encourage the model to generate content in the reference summaries that have not been covered. Experiments on the DialogSum and SAMSum datasets confirm the effectiveness of the proposed approach.
arXiv Detail & Related papers (2023-01-25T09:33:11Z)
Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document. We also simultaneously cluster users, removing the need for post-hoc cluster estimation. Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z)
Relation Clustering in Narrative Knowledge Graphs [71.98234178455398]
relational sentences in the original text are embedded (with SBERT) and clustered in order to merge together semantically similar relations. Preliminary tests show that such clustering might successfully detect similar relations, and provide a valuable preprocessing for semi-supervised approaches.
arXiv Detail & Related papers (2020-11-27T10:43:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.