Self-Refining Language Model Anonymizers via Adversarial Distillation
- URL: http://arxiv.org/abs/2506.01420v1
- Date: Mon, 02 Jun 2025 08:21:27 GMT
- Title: Self-Refining Language Model Anonymizers via Adversarial Distillation
- Authors: Kyuyoung Kim, Hyunjun Jeon, Jinwoo Shin,
- Abstract summary: Large language models (LLMs) are increasingly used in sensitive domains, where their ability to infer personal data poses emerging privacy risks.<n>We introduce SElf-refining Anonymization with Language model (SEAL), a novel distillation framework for training small language models (SLMs) to perform effective anonymization.
- Score: 49.17383264812234
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) are increasingly used in sensitive domains, where their ability to infer personal data from seemingly benign text poses emerging privacy risks. While recent LLM-based anonymization methods help mitigate such risks, they often rely on proprietary models (e.g., GPT-4), raising concerns about cost and the potential exposure of sensitive data to untrusted external systems. To address this, we introduce SElf-refining Anonymization with Language model (SEAL), a novel distillation framework for training small language models (SLMs) to perform effective anonymization without relying on external costly models at inference time. We leverage adversarial interactions between an LLM anonymizer and an inference model to collect trajectories of anonymized texts and inferred attributes, which are used to distill anonymization, adversarial inference, and utility evaluation capabilities into SLMs via supervised fine-tuning and preference learning. The resulting models learn to both anonymize text and critique their outputs, enabling iterative improvement of anonymization quality via self-refinement. Experiments on SynthPAI, a dataset of synthetic personal profiles and text comments, demonstrate that SLMs trained with SEAL achieve substantial improvements in anonymization capabilities. Notably, 8B models attain a privacy-utility trade-off comparable to that of the GPT-4 anonymizer and, with self-refinement, even surpass it in terms of privacy. These results show the effectiveness of our adversarial distillation framework in training SLMs as efficient anonymizers. To facilitate further research, we release the full dataset used in our experiments.
Related papers
- AgentStealth: Reinforcing Large Language Model for Anonymizing User-generated Text [8.758843436588297]
AgentStealth is a self-reinforcing language model for text anonymization.<n>We show that our method outperforms baselines in both anonymization effectiveness and utility.<n>Our lightweight design supports direct deployment on edge devices, avoiding cloud reliance and communication-based privacy risks.
arXiv Detail & Related papers (2025-06-26T02:48:16Z) - Automated Profile Inference with Language Model Agents [67.32226960040514]
We study a new threat that LLMs pose to online pseudonymity, called automated profile inference.<n>An adversary can instruct LLMs to automatically scrape and extract sensitive personal attributes from publicly visible user activities on pseudonymous platforms.<n>We introduce an automated profiling framework called AutoProfiler to assess the feasibility of such threats in real-world scenarios.
arXiv Detail & Related papers (2025-05-18T13:05:17Z) - Augmenting Anonymized Data with AI: Exploring the Feasibility and Limitations of Large Language Models in Data Enrichment [3.459382629188014]
Large Language Models (LLMs) have demonstrated advanced capabilities in both text generation and comprehension.<n>Their application to data archives might facilitate the privatization of sensitive information about the data subjects.<n>This data, if not safeguarded, may bring privacy risks in terms of both disclosure and identification.
arXiv Detail & Related papers (2025-04-03T13:26:59Z) - Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy.
Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models.
This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z) - Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack.
When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model.
Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - Large Language Models are Advanced Anonymizers [2.9373912230684565]
Recent privacy research on large language models (LLMs) has shown that they achieve near-human-level performance at inferring personal data from online texts.<n>Existing text anonymization methods are currently lacking behind regulatory requirements and adversarial threats.<n>We present a new setting for evaluating anonymization in the face of adversarial LLM inferences.
arXiv Detail & Related papers (2024-02-21T14:44:00Z) - Assessing Privacy Risks in Language Models: A Case Study on
Summarization Tasks [65.21536453075275]
We focus on the summarization task and investigate the membership inference (MI) attack.
We exploit text similarity and the model's resistance to document modifications as potential MI signals.
We discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.
arXiv Detail & Related papers (2023-10-20T05:44:39Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Anonymizing Machine Learning Models [0.0]
Anonymized data is exempt from obligations set out in regulations such as the EU General Data Protection Regulation.
We propose a method that is able to achieve better model accuracy by using the knowledge encoded within the trained model.
We also demonstrate that our approach has a similar, and sometimes even better ability to prevent membership attacks as approaches based on differential privacy.
arXiv Detail & Related papers (2020-07-26T09:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.