Related papers: Large Language Models are Advanced Anonymizers

Large Language Models are Advanced Anonymizers

URL: http://arxiv.org/abs/2402.13846v1
Date: Wed, 21 Feb 2024 14:44:00 GMT
Title: Large Language Models are Advanced Anonymizers
Authors: Robin Staab, Mark Vero, Mislav Balunovi\'c, Martin Vechev
Abstract summary: We show how adversarial anonymization outperforms current industry-grade anonymizers in terms of the resulting utility and privacy. We first present a new setting for evaluating anonymizations in the face of adversarial LLMs inferences.
Score: 13.900633576526863
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent work in privacy research on large language models has shown that they achieve near human-level performance at inferring personal data from real-world online texts. With consistently increasing model capabilities, existing text anonymization methods are currently lacking behind regulatory requirements and adversarial threats. This raises the question of how individuals can effectively protect their personal data in sharing online texts. In this work, we take two steps to answer this question: We first present a new setting for evaluating anonymizations in the face of adversarial LLMs inferences, allowing for a natural measurement of anonymization performance while remedying some of the shortcomings of previous metrics. We then present our LLM-based adversarial anonymization framework leveraging the strong inferential capabilities of LLMs to inform our anonymization procedure. In our experimental evaluation, we show on real-world and synthetic online texts how adversarial anonymization outperforms current industry-grade anonymizers both in terms of the resulting utility and privacy.

Related papers

AgentStealth: Reinforcing Large Language Model for Anonymizing User-generated Text [8.758843436588297]
AgentStealth is a self-reinforcing language model for text anonymization.<n>We show that our method outperforms baselines in both anonymization effectiveness and utility.<n>Our lightweight design supports direct deployment on edge devices, avoiding cloud reliance and communication-based privacy risks.
arXiv Detail & Related papers (2025-06-26T02:48:16Z)
Self-Refining Language Model Anonymizers via Adversarial Distillation [49.17383264812234]
Large language models (LLMs) are increasingly used in sensitive domains, where their ability to infer personal data poses emerging privacy risks.<n>We introduce SElf-refining Anonymization with Language model (SEAL), a novel distillation framework for training small language models (SLMs) to perform effective anonymization.
arXiv Detail & Related papers (2025-06-02T08:21:27Z)
Automated Profile Inference with Language Model Agents [67.32226960040514]
We study a new threat that LLMs pose to online pseudonymity, called automated profile inference.<n>An adversary can instruct LLMs to automatically scrape and extract sensitive personal attributes from publicly visible user activities on pseudonymous platforms.<n>We introduce an automated profiling framework called AutoProfiler to assess the feasibility of such threats in real-world scenarios.
arXiv Detail & Related papers (2025-05-18T13:05:17Z)
Augmenting Anonymized Data with AI: Exploring the Feasibility and Limitations of Large Language Models in Data Enrichment [3.459382629188014]
Large Language Models (LLMs) have demonstrated advanced capabilities in both text generation and comprehension. Their application to data archives might facilitate the privatization of sensitive information about the data subjects. This data, if not safeguarded, may bring privacy risks in terms of both disclosure and identification.
arXiv Detail & Related papers (2025-04-03T13:26:59Z)
Model Inversion Attacks: A Survey of Approaches and Countermeasures [59.986922963781]
Recently, a new type of privacy attack, the model inversion attacks (MIAs), aims to extract sensitive features of private data for training. Despite the significance, there is a lack of systematic studies that provide a comprehensive overview and deeper insights into MIAs. This survey aims to summarize up-to-date MIA methods in both attacks and defenses.
arXiv Detail & Related papers (2024-11-15T08:09:28Z)
Low-Latency Video Anonymization for Crowd Anomaly Detection: Privacy vs. Performance [5.78828936452823]
This study revisits conventional anonymization solutions for privacy protection and real-time video anomaly detection applications. We propose a novel lightweight adaptive anonymization for VAD (LA3D) that employs dynamic adjustment to enhance privacy protection. Our experiment demonstrates that LA3D enables substantial improvement in the privacy anonymization capability without majorly degrading VAD efficacy.
arXiv Detail & Related papers (2024-10-24T13:22:33Z)
Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy. Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models. This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z)
Unlocking the Potential of Large Language Models for Clinical Text Anonymization: A Comparative Study [4.1692340552627405]
Automated clinical text anonymization has the potential to unlock the widespread sharing of textual health data for secondary usage. Despite the proposal of many complex and theoretically successful anonymization solutions in literature, these techniques remain flawed. Recent advances in developing Large Language Models (LLMs) pose a promising opportunity to further the field.
arXiv Detail & Related papers (2024-05-29T23:07:58Z)
Anonymization Prompt Learning for Facial Privacy-Preserving Text-to-Image Generation [56.46932751058042]
We train a learnable prompt prefix for text-to-image diffusion models, which forces the model to generate anonymized facial identities. Experiments demonstrate the successful anonymization performance of APL, which anonymizes any specific individuals without compromising the quality of non-identity-specific image generation.
arXiv Detail & Related papers (2024-05-27T07:38:26Z)
Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory [82.7042006247124]
We show that even the most capable AI models reveal private information in contexts that humans would not, 39% and 57% of the time, respectively. Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
arXiv Detail & Related papers (2023-10-27T04:15:30Z)
Privacy in Large Language Models: Attacks, Defenses and Future Directions [84.73301039987128]
We analyze the current privacy attacks targeting large language models (LLMs) and categorize them according to the adversary's assumed capabilities. We present a detailed overview of prominent defense strategies that have been developed to counter these privacy attacks.
arXiv Detail & Related papers (2023-10-16T13:23:54Z)
PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind) Our work offers a theoretical analysis for model design and benchmarks various techniques. In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z)
The Limits of Word Level Differential Privacy [30.34805746574316]
We propose a new method for text anonymization based on transformer based language models fine-tuned for paraphrasing. We evaluate the performance of our method via thorough experimentation and demonstrate superior performance over the discussed mechanisms.
arXiv Detail & Related papers (2022-05-02T21:53:10Z)
Membership Inference Attacks Against Self-supervised Speech Models [62.73937175625953]
Self-supervised learning (SSL) on continuous speech has started gaining attention. We present the first privacy analysis on several SSL speech models using Membership Inference Attacks (MIA) under black-box access.
arXiv Detail & Related papers (2021-11-09T13:00:24Z)
No Intruder, no Validity: Evaluation Criteria for Privacy-Preserving Text Anonymization [0.48733623015338234]
We argue that researchers and practitioners developing automated text anonymization systems should carefully assess whether their evaluation methods truly reflect the system's ability to protect individuals from being re-identified. We propose TILD, a set of evaluation criteria that comprises an anonymization method's technical performance, the information loss resulting from its anonymization, and the human ability to de-anonymize redacted documents.
arXiv Detail & Related papers (2021-03-16T18:18:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.