Related papers: PrivacyXray: Detecting Privacy Breaches in LLMs through Semantic Consistency and Probability Certainty

PrivacyXray: Detecting Privacy Breaches in LLMs through Semantic Consistency and Probability Certainty

URL: http://arxiv.org/abs/2506.19563v1
Date: Tue, 24 Jun 2025 12:22:59 GMT
Title: PrivacyXray: Detecting Privacy Breaches in LLMs through Semantic Consistency and Probability Certainty
Authors: Jinwen He, Yiyang Lu, Zijin Lin, Kai Chen, Yue Zhao,
Abstract summary: Large Language Models (LLMs) are widely used in sensitive domains, including healthcare, finance, and legal services.<n>PrivacyXray is a novel framework detecting privacy breaches by analyzing LLM inner states.<n>It achieves consistent performance, with an average accuracy of 92.69% across five LLMs.
Score: 11.921857301582524
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) are widely used in sensitive domains, including healthcare, finance, and legal services, raising concerns about potential private information leaks during inference. Privacy extraction attacks, such as jailbreaking, expose vulnerabilities in LLMs by crafting inputs that force the models to output sensitive information. However, these attacks cannot verify whether the extracted private information is accurate, as no public datasets exist for cross-validation, leaving a critical gap in private information detection during inference. To address this, we propose PrivacyXray, a novel framework detecting privacy breaches by analyzing LLM inner states. Our analysis reveals that LLMs exhibit higher semantic coherence and probabilistic certainty when generating correct private outputs. Based on this, PrivacyXray detects privacy breaches using four metrics: intra-layer and inter-layer semantic similarity, token-level and sentence-level probability distributions. PrivacyXray addresses critical challenges in private information detection by overcoming the lack of open-source private datasets and eliminating reliance on external data for validation. It achieves this through the synthesis of realistic private data and a detection mechanism based on the inner states of LLMs. Experiments show that PrivacyXray achieves consistent performance, with an average accuracy of 92.69% across five LLMs. Compared to state-of-the-art methods, PrivacyXray achieves significant improvements, with an average accuracy increase of 20.06%, highlighting its stability and practical utility in real-world applications.

Related papers

On the MIA Vulnerability Gap Between Private GANs and Diffusion Models [51.53790101362898]
Generative Adversarial Networks (GANs) and diffusion models have emerged as leading approaches for high-quality image synthesis.<n>We present the first unified theoretical and empirical analysis of the privacy risks faced by differentially private generative models.
arXiv Detail & Related papers (2025-09-03T14:18:22Z)
A Tight Context-aware Privacy Bound for Histogram Publication [15.762008564991207]
We analyze the privacy guarantees of the Laplace mechanism releasing the histogram of a dataset through the lens of pointwise maximal leakage (PML)<n>We show that when the probability of each histogram bin is bounded away from zero, stronger privacy protection can be achieved for a fixed level of noise.
arXiv Detail & Related papers (2025-08-26T09:12:23Z)
MAGPIE: A dataset for Multi-AGent contextual PrIvacy Evaluation [54.410825977390274]
Existing benchmarks to evaluate contextual privacy in LLM-agents primarily assess single-turn, low-complexity tasks.<n>We first present a benchmark - MAGPIE comprising 158 real-life high-stakes scenarios across 15 domains.<n>We then evaluate the current state-of-the-art LLMs on their understanding of contextually private data and their ability to collaborate without violating user privacy.
arXiv Detail & Related papers (2025-06-25T18:04:25Z)
Information-theoretic Estimation of the Risk of Privacy Leaks [0.0]
dependencies between items in a dataset can lead to privacy leaks.<n>We measure the correlation between the original data and their noisy responses from a randomizer as an indicator of potential privacy breaches.
arXiv Detail & Related papers (2025-06-14T03:39:11Z)
A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage [77.83757117924995]
We propose a new framework that evaluates re-identification attacks to quantify individual privacy risks upon data release.<n>Our approach shows that seemingly innocuous auxiliary information can be used to infer sensitive attributes like age or substance use history from sanitized data.
arXiv Detail & Related papers (2025-04-28T01:16:27Z)
PrivaCI-Bench: Evaluating Privacy with Contextual Integrity and Legal Compliance [44.287734754038254]
We present PrivaCI-Bench, a contextual privacy evaluation benchmark for generative large language models (LLMs)<n>We evaluate the latest LLMs, including the recent reasoner models QwQ-32B and Deepseek R1.<n>Our experimental results suggest that though LLMs can effectively capture key CI parameters inside a given context, they still require further advancements for privacy compliance.
arXiv Detail & Related papers (2025-02-24T10:49:34Z)
Enforcing Demographic Coherence: A Harms Aware Framework for Reasoning about Private Data Release [14.939460540040459]
We introduce demographic coherence, a condition inspired by privacy attacks that we argue is necessary for data privacy.<n>Our framework focuses on confidence rated predictors, which can in turn be distilled from almost any data-informed process.<n>We prove that every differentially private data release is also demographically coherent, and that there are demographically coherent algorithms which are not differentially private.
arXiv Detail & Related papers (2025-02-04T20:42:30Z)
Privacy-Preserving Retrieval-Augmented Generation with Differential Privacy [25.896416088293908]
retrieval-augmented generation (RAG) is particularly effective in assisting large language models (LLMs)<n>RAG outputs risk leaking sensitive information from the external data source.<n>We propose an algorithm that smartly spends privacy budget only for the tokens that require the sensitive information.
arXiv Detail & Related papers (2024-12-06T01:20:16Z)
PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action [54.11479432110771]
PrivacyLens is a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories.<n>We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds.<n>State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions.
arXiv Detail & Related papers (2024-08-29T17:58:38Z)
Simulation-based Bayesian Inference from Privacy Protected Data [0.0]
We propose simulation-based inference methods from privacy-protected datasets.<n>We illustrate our methods on discrete time-series data under an infectious disease model and with ordinary linear regression models.
arXiv Detail & Related papers (2023-10-19T14:34:17Z)
Privacy Preserving Large Language Models: ChatGPT Case Study Based Vision and Framework [6.828884629694705]
This article proposes the conceptual model called PrivChatGPT, a privacy-generative model for LLMs. PrivChatGPT consists of two main components i.e., preserving user privacy during the data curation/pre-processing together with preserving private context and the private training process for large-scale data.
arXiv Detail & Related papers (2023-10-19T06:55:13Z)
PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind) Our work offers a theoretical analysis for model design and benchmarks various techniques. In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z)
Breaking the Communication-Privacy-Accuracy Tradeoff with $f$-Differential Privacy [51.11280118806893]
We consider a federated data analytics problem in which a server coordinates the collaborative data analysis of multiple users with privacy concerns and limited communication capability. We study the local differential privacy guarantees of discrete-valued mechanisms with finite output space through the lens of $f$-differential privacy (DP) More specifically, we advance the existing literature by deriving tight $f$-DP guarantees for a variety of discrete-valued mechanisms.
arXiv Detail & Related papers (2023-02-19T16:58:53Z)
Graph-Homomorphic Perturbations for Private Decentralized Learning [64.26238893241322]
Local exchange of estimates allows inference of data based on private data. perturbations chosen independently at every agent, resulting in a significant performance loss. We propose an alternative scheme, which constructs perturbations according to a particular nullspace condition, allowing them to be invisible.
arXiv Detail & Related papers (2020-10-23T10:35:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.