Exposing Privacy Risks in Anonymizing Clinical Data: Combinatorial Refinement Attacks on k-Anonymity Without Auxiliary Information
- URL: http://arxiv.org/abs/2509.03350v1
- Date: Wed, 03 Sep 2025 14:36:06 GMT
- Title: Exposing Privacy Risks in Anonymizing Clinical Data: Combinatorial Refinement Attacks on k-Anonymity Without Auxiliary Information
- Authors: Somiya Chhillar, Mary K. Righi, Rebecca E. Sutter, Evgenios M. Kornaropoulos,
- Abstract summary: We introduce a new class of privacy attacks targeting k-anonymized datasets produced using local recoding.<n>Our results on real-world clinical microdata reveal that even in the absence of external information, established anonymization frameworks do not deliver the promised level of privacy.
- Score: 3.3423762257383216
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite longstanding criticism from the privacy community, k-anonymity remains a widely used standard for data anonymization, mainly due to its simplicity, regulatory alignment, and preservation of data utility. However, non-experts often defend k-anonymity on the grounds that, in the absence of auxiliary information, no known attacks can compromise its protections. In this work, we refute this claim by introducing Combinatorial Refinement Attacks (CRA), a new class of privacy attacks targeting k-anonymized datasets produced using local recoding. This is the first method that does not rely on external auxiliary information or assumptions about the underlying data distribution. CRA leverages the utility-optimizing behavior of local recoding anonymization of ARX, which is a widely used open-source software for anonymizing data in clinical settings, to formulate a linear program that significantly reduces the space of plausible sensitive values. To validate our findings, we partnered with a network of free community health clinics, an environment where (1) auxiliary information is indeed hard to find due to the population they serve and (2) open-source k-anonymity solutions are attractive due to regulatory obligations and limited resources. Our results on real-world clinical microdata reveal that even in the absence of external information, established anonymization frameworks do not deliver the promised level of privacy, raising critical privacy concerns.
Related papers
- Extension of Spatial k-Anonymity: New Metrics for Assessing the Anonymity of Geomasked Data Considering Realistic Attack Scenarios [0.0]
The degree of anonymity of anonymized georeferenced datasets is often measured by the so-called metric of spatial k-anonymity.<n>This article classifies the potential data attack scenarios in the context of anonymized georeferenced microdata and introduces appropriate metrics that enable a comprehensive assessment of anonymity adapted to potential data attack scenarios.
arXiv Detail & Related papers (2025-09-09T08:38:52Z) - Privacy, Informed Consent and the Demand for Anonymisation of Smart Meter Data [2.111461702802409]
We use a mixed-methods approach to estimate non-monetary (willingness-to-share and smart metering demand) and monetary (willingness-to-pay/accept) preferences for anonymisation.<n>On average, consumers are willing to pay for anonymisation, are more willing to share data when anonymised and less willing to share non-anonymised data once anonymisation is presented as an option.
arXiv Detail & Related papers (2025-08-27T20:05:09Z) - Self-Refining Language Model Anonymizers via Adversarial Distillation [49.17383264812234]
Large language models (LLMs) are increasingly used in sensitive domains, where their ability to infer personal data poses emerging privacy risks.<n>We introduce SElf-refining Anonymization with Language model (SEAL), a novel distillation framework for training small language models (SLMs) to perform effective anonymization.
arXiv Detail & Related papers (2025-06-02T08:21:27Z) - A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage [77.83757117924995]
We propose a new framework that evaluates re-identification attacks to quantify individual privacy risks upon data release.<n>Our approach shows that seemingly innocuous auxiliary information can be used to infer sensitive attributes like age or substance use history from sanitized data.
arXiv Detail & Related papers (2025-04-28T01:16:27Z) - Investigating Vulnerabilities of GPS Trip Data to Trajectory-User Linking Attacks [49.1574468325115]
We propose a novel attack to reconstruct user identifiers in GPS trip datasets consisting of single trips.<n>We show that the risk of re-identification is significant even when personal identifiers have been removed.<n>Further investigations indicate that users who frequently visit locations that are only visited by a small number of others tend to be more vulnerable to re-identification.
arXiv Detail & Related papers (2025-02-12T08:54:49Z) - Unveiling Privacy Vulnerabilities: Investigating the Role of Structure in Graph Data [17.11821761700748]
This study advances the understanding and protection against privacy risks emanating from network structure.
We develop a novel graph private attribute inference attack, which acts as a pivotal tool for evaluating the potential for privacy leakage through network structures.
Our attack model poses a significant threat to user privacy, and our graph data publishing method successfully achieves the optimal privacy-utility trade-off.
arXiv Detail & Related papers (2024-07-26T07:40:54Z) - Secure Aggregation is Not Private Against Membership Inference Attacks [66.59892736942953]
We investigate the privacy implications of SecAgg in federated learning.
We show that SecAgg offers weak privacy against membership inference attacks even in a single training round.
Our findings underscore the imperative for additional privacy-enhancing mechanisms, such as noise injection.
arXiv Detail & Related papers (2024-03-26T15:07:58Z) - Privacy-Preserving Hierarchical Anonymization Framework over Encrypted Data [0.061446808540639365]
This study proposes a hierarchical k-anonymization framework using homomorphic encryption and secret sharing composed of two types of domains.
The experimental results show that connecting two domains can accelerate the anonymization process, indicating that the proposed secure hierarchical architecture is practical and efficient.
arXiv Detail & Related papers (2023-10-19T01:08:37Z) - Breaking the Communication-Privacy-Accuracy Tradeoff with
$f$-Differential Privacy [51.11280118806893]
We consider a federated data analytics problem in which a server coordinates the collaborative data analysis of multiple users with privacy concerns and limited communication capability.
We study the local differential privacy guarantees of discrete-valued mechanisms with finite output space through the lens of $f$-differential privacy (DP)
More specifically, we advance the existing literature by deriving tight $f$-DP guarantees for a variety of discrete-valued mechanisms.
arXiv Detail & Related papers (2023-02-19T16:58:53Z) - Releasing survey microdata with exact cluster locations and additional
privacy safeguards [77.34726150561087]
We propose an alternative microdata dissemination strategy that leverages the utility of the original microdata with additional privacy safeguards.
Our strategy reduces the respondents' re-identification risk for any number of disclosed attributes by 60-80% even under re-identification attempts.
arXiv Detail & Related papers (2022-05-24T19:37:11Z) - Statistical anonymity: Quantifying reidentification risks without
reidentifying users [4.103598036312231]
Data anonymization is an approach to privacy-preserving data release aimed at preventing participants reidentification.
Existing algorithms for enforcing $k$-anonymity in the released data assume that the curator performing the anonymization has complete access to the original data.
This paper explores ideas for reducing the trust that must be placed in the curator, while still maintaining a statistical notion of $k$-anonymity.
arXiv Detail & Related papers (2022-01-28T18:12:44Z) - A Review of Anonymization for Healthcare Data [0.30586855806896046]
Health data is highly sensitive and subject to regulations such as General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation ( General Data Protection Regulation (
arXiv Detail & Related papers (2021-04-13T21:44:29Z) - Graph-Homomorphic Perturbations for Private Decentralized Learning [64.26238893241322]
Local exchange of estimates allows inference of data based on private data.
perturbations chosen independently at every agent, resulting in a significant performance loss.
We propose an alternative scheme, which constructs perturbations according to a particular nullspace condition, allowing them to be invisible.
arXiv Detail & Related papers (2020-10-23T10:35:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.