How do we measure privacy in text? A survey of text anonymization metrics
- URL: http://arxiv.org/abs/2512.01109v1
- Date: Sun, 30 Nov 2025 22:12:30 GMT
- Title: How do we measure privacy in text? A survey of text anonymization metrics
- Authors: Yaxuan Ren, Krithika Ramesh, Yaxing Yao, Anjalie Field,
- Abstract summary: We aim to clarify and reconcile metrics for evaluating privacy protection in text through a systematic survey.<n>We identify and compare six distinct privacy notions, and analyze how the associated metrics capture different aspects of privacy risk.
- Score: 14.08328402597163
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In this work, we aim to clarify and reconcile metrics for evaluating privacy protection in text through a systematic survey. Although text anonymization is essential for enabling NLP research and model development in domains with sensitive data, evaluating whether anonymization methods sufficiently protect privacy remains an open challenge. In manually reviewing 47 papers that report privacy metrics, we identify and compare six distinct privacy notions, and analyze how the associated metrics capture different aspects of privacy risk. We then assess how well these notions align with legal privacy standards (HIPAA and GDPR), as well as user-centered expectations grounded in HCI studies. Our analysis offers practical guidance on navigating the landscape of privacy evaluation approaches further and highlights gaps in current practices. Ultimately, we aim to facilitate more robust, comparable, and legally aware privacy evaluations in text anonymization.
Related papers
- Zero-Shot Privacy-Aware Text Rewriting via Iterative Tree Search [60.197239728279534]
Large language models (LLMs) in cloud-based services have raised significant privacy concerns.<n>Existing text anonymization and de-identification techniques, such as rule-based redaction and scrubbing, often struggle to balance privacy preservation with text naturalness and utility.<n>We propose a zero-shot, tree-search-based iterative sentence rewriting algorithm that systematically obfuscates or deletes private information while preserving coherence, relevance, and naturalness.
arXiv Detail & Related papers (2025-09-25T07:23:52Z) - Evaluation of Human Visual Privacy Protection: A Three-Dimensional Framework and Benchmark Dataset [2.184775414778289]
This paper presents a framework for evaluating visual privacy-protection methods across three dimensions: privacy, utility, and practicality.<n>It introduces HR-VISPR, a publicly available human-centric dataset with biometric, soft-biometric, and non-biometric labels to train an interpretable privacy metric.
arXiv Detail & Related papers (2025-07-18T14:43:24Z) - PrivaCI-Bench: Evaluating Privacy with Contextual Integrity and Legal Compliance [44.287734754038254]
We present PrivaCI-Bench, a contextual privacy evaluation benchmark for generative large language models (LLMs)<n>We evaluate the latest LLMs, including the recent reasoner models QwQ-32B and Deepseek R1.<n>Our experimental results suggest that though LLMs can effectively capture key CI parameters inside a given context, they still require further advancements for privacy compliance.
arXiv Detail & Related papers (2025-02-24T10:49:34Z) - Enforcing Demographic Coherence: A Harms Aware Framework for Reasoning about Private Data Release [14.939460540040459]
We introduce demographic coherence, a condition inspired by privacy attacks that we argue is necessary for data privacy.<n>Our framework focuses on confidence rated predictors, which can in turn be distilled from almost any data-informed process.<n>We prove that every differentially private data release is also demographically coherent, and that there are demographically coherent algorithms which are not differentially private.
arXiv Detail & Related papers (2025-02-04T20:42:30Z) - PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action [54.11479432110771]
PrivacyLens is a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories.<n>We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds.<n>State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions.
arXiv Detail & Related papers (2024-08-29T17:58:38Z) - Collection, usage and privacy of mobility data in the enterprise and public administrations [55.2480439325792]
Security measures such as anonymization are needed to protect individuals' privacy.
Within our study, we conducted expert interviews to gain insights into practices in the field.
We survey privacy-enhancing methods in use, which generally do not comply with state-of-the-art standards of differential privacy.
arXiv Detail & Related papers (2024-07-04T08:29:27Z) - A Summary of Privacy-Preserving Data Publishing in the Local Setting [0.6749750044497732]
Statistical Disclosure Control aims to minimize the risk of exposing confidential information by de-identifying it.
We outline the current privacy-preserving techniques employed in microdata de-identification, delve into privacy measures tailored for various disclosure scenarios, and assess metrics for information loss and predictive performance.
arXiv Detail & Related papers (2023-12-19T04:23:23Z) - SeePrivacy: Automated Contextual Privacy Policy Generation for Mobile
Applications [21.186902172367173]
SeePrivacy is designed to automatically generate contextual privacy policies for mobile apps.
Our method synergistically combines mobile GUI understanding and privacy policy document analysis.
96% of the retrieved policy segments can be correctly matched with their contexts.
arXiv Detail & Related papers (2023-07-04T12:52:45Z) - How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss.
We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z) - Algorithms with More Granular Differential Privacy Guarantees [65.3684804101664]
We consider partial differential privacy (DP), which allows quantifying the privacy guarantee on a per-attribute basis.
In this work, we study several basic data analysis and learning tasks, and design algorithms whose per-attribute privacy parameter is smaller that the best possible privacy parameter for the entire record of a person.
arXiv Detail & Related papers (2022-09-08T22:43:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.