Large Language Models are Advanced Anonymizers
- URL: http://arxiv.org/abs/2402.13846v1
- Date: Wed, 21 Feb 2024 14:44:00 GMT
- Title: Large Language Models are Advanced Anonymizers
- Authors: Robin Staab, Mark Vero, Mislav Balunovi\'c, Martin Vechev
- Abstract summary: We show how adversarial anonymization outperforms current industry-grade anonymizers in terms of the resulting utility and privacy.
We first present a new setting for evaluating anonymizations in the face of adversarial LLMs inferences.
- Score: 13.900633576526863
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work in privacy research on large language models has shown that they
achieve near human-level performance at inferring personal data from real-world
online texts. With consistently increasing model capabilities, existing text
anonymization methods are currently lacking behind regulatory requirements and
adversarial threats. This raises the question of how individuals can
effectively protect their personal data in sharing online texts. In this work,
we take two steps to answer this question: We first present a new setting for
evaluating anonymizations in the face of adversarial LLMs inferences, allowing
for a natural measurement of anonymization performance while remedying some of
the shortcomings of previous metrics. We then present our LLM-based adversarial
anonymization framework leveraging the strong inferential capabilities of LLMs
to inform our anonymization procedure. In our experimental evaluation, we show
on real-world and synthetic online texts how adversarial anonymization
outperforms current industry-grade anonymizers both in terms of the resulting
utility and privacy.
Related papers
- Low-Latency Video Anonymization for Crowd Anomaly Detection: Privacy vs. Performance [5.78828936452823]
This study revisits conventional anonymization solutions for privacy protection and real-time video anomaly detection applications.
We propose a novel lightweight adaptive anonymization for VAD (LA3D) that employs dynamic adjustment to enhance privacy protection.
Our experiment demonstrates that LA3D enables substantial improvement in the privacy anonymization capability without majorly degrading VAD efficacy.
arXiv Detail & Related papers (2024-10-24T13:22:33Z) - Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy.
Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models.
This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z) - Unlocking the Potential of Large Language Models for Clinical Text Anonymization: A Comparative Study [4.1692340552627405]
Automated clinical text anonymization has the potential to unlock the widespread sharing of textual health data for secondary usage.
Despite the proposal of many complex and theoretically successful anonymization solutions in literature, these techniques remain flawed.
Recent advances in developing Large Language Models (LLMs) pose a promising opportunity to further the field.
arXiv Detail & Related papers (2024-05-29T23:07:58Z) - Anonymization Prompt Learning for Facial Privacy-Preserving Text-to-Image Generation [56.46932751058042]
We train a learnable prompt prefix for text-to-image diffusion models, which forces the model to generate anonymized facial identities.
Experiments demonstrate the successful anonymization performance of APL, which anonymizes any specific individuals without compromising the quality of non-identity-specific image generation.
arXiv Detail & Related papers (2024-05-27T07:38:26Z) - Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory [82.7042006247124]
We show that even the most capable AI models reveal private information in contexts that humans would not, 39% and 57% of the time, respectively.
Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
arXiv Detail & Related papers (2023-10-27T04:15:30Z) - Privacy in Large Language Models: Attacks, Defenses and Future Directions [84.73301039987128]
We analyze the current privacy attacks targeting large language models (LLMs) and categorize them according to the adversary's assumed capabilities.
We present a detailed overview of prominent defense strategies that have been developed to counter these privacy attacks.
arXiv Detail & Related papers (2023-10-16T13:23:54Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - The Limits of Word Level Differential Privacy [30.34805746574316]
We propose a new method for text anonymization based on transformer based language models fine-tuned for paraphrasing.
We evaluate the performance of our method via thorough experimentation and demonstrate superior performance over the discussed mechanisms.
arXiv Detail & Related papers (2022-05-02T21:53:10Z) - Membership Inference Attacks Against Self-supervised Speech Models [62.73937175625953]
Self-supervised learning (SSL) on continuous speech has started gaining attention.
We present the first privacy analysis on several SSL speech models using Membership Inference Attacks (MIA) under black-box access.
arXiv Detail & Related papers (2021-11-09T13:00:24Z) - No Intruder, no Validity: Evaluation Criteria for Privacy-Preserving
Text Anonymization [0.48733623015338234]
We argue that researchers and practitioners developing automated text anonymization systems should carefully assess whether their evaluation methods truly reflect the system's ability to protect individuals from being re-identified.
We propose TILD, a set of evaluation criteria that comprises an anonymization method's technical performance, the information loss resulting from its anonymization, and the human ability to de-anonymize redacted documents.
arXiv Detail & Related papers (2021-03-16T18:18:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.