Self-Refining Language Model Anonymizers via Adversarial Distillation
- URL: http://arxiv.org/abs/2506.01420v2
- Date: Thu, 23 Oct 2025 19:22:08 GMT
- Title: Self-Refining Language Model Anonymizers via Adversarial Distillation
- Authors: Kyuyoung Kim, Hyunjun Jeon, Jinwoo Shin,
- Abstract summary: We introduce SElf-refining Anonymization with Language model (SEAL)<n>SEAL is a novel distillation framework for training small language models (SLMs) to perform effective anonymization without relying on external models at inference time.<n>Experiments on SynthPAI, a dataset of synthetic personal profiles and text comments, demonstrate that SLMs trained with SEAL achieve substantial improvements in anonymization capabilities.
- Score: 48.280759014096354
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) are increasingly used in sensitive domains, where their ability to infer personal data from seemingly benign text introduces emerging privacy risks. While recent LLM-based anonymization methods help mitigate such risks, they often rely on proprietary models (e.g., GPT-4), raising concerns about cost and the potential exposure of sensitive data to untrusted external systems. To address this, we introduce SElf-refining Anonymization with Language model (SEAL), a novel distillation framework for training small language models (SLMs) to perform effective anonymization without relying on external models at inference time. SEAL leverages adversarial interactions between an LLM anonymizer and an inference model to collect trajectories of anonymized texts and inferred attributes, which are then used to distill anonymization and critique capabilities into SLMs through supervised fine-tuning and preference learning. The resulting models learn both to anonymize text and to evaluate their outputs, enabling iterative improvement of anonymization quality via self-refinement. Experiments on SynthPAI, a dataset of synthetic personal profiles and text comments, demonstrate that SLMs trained with SEAL achieve substantial improvements in anonymization capabilities. Notably, 8B models attain a privacy-utility trade-off comparable to that of the GPT-4 anonymizer and, with self-refinement, even surpass it in terms of privacy protection. These results highlight the effectiveness of our adversarial distillation framework for training SLMs as efficient anonymizers.
Related papers
- Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs [61.15237978606501]
Large language models can infer private user attributes from user-generated text.<n>Existing anonymization-based defenses are coarse-grained, lacking word-level precision in anonymizing privacy-leaking elements.<n>We propose a unified defense framework that combines fine-grained anonymization (TRACE) with inference-preventing optimization (RPS)
arXiv Detail & Related papers (2026-02-12T03:37:50Z) - RL-Finetuned LLMs for Privacy-Preserving Synthetic Rewriting [17.294176570269]
We propose a reinforcement learning framework that fine-tunes a large language model (LLM) using a composite reward function.<n>The privacy reward combines semantic cues with structural patterns derived from a minimum spanning tree (MST) over latent representations.<n> Empirical results show that the proposed method significantly enhances author obfuscation and privacy metrics without degrading semantic quality.
arXiv Detail & Related papers (2025-08-25T04:38:19Z) - LLM4MEA: Data-free Model Extraction Attacks on Sequential Recommenders via Large Language Models [50.794651919028965]
Recent studies have demonstrated the vulnerability of sequential recommender systems to Model Extraction Attacks (MEAs)<n>Black-box attacks in prior MEAs are ineffective at exposing recommender system vulnerabilities due to random sampling in data selection.<n>We propose LLM4MEA, a novel model extraction method that leverages Large Language Models (LLMs) as human-like rankers to generate data.
arXiv Detail & Related papers (2025-07-22T19:20:23Z) - AgentStealth: Reinforcing Large Language Model for Anonymizing User-generated Text [8.758843436588297]
AgentStealth is a self-reinforcing language model for text anonymization.<n>We show that our method outperforms baselines in both anonymization effectiveness and utility.<n>Our lightweight design supports direct deployment on edge devices, avoiding cloud reliance and communication-based privacy risks.
arXiv Detail & Related papers (2025-06-26T02:48:16Z) - Automated Profile Inference with Language Model Agents [67.32226960040514]
We study a new threat that LLMs pose to online pseudonymity, called automated profile inference.<n>An adversary can instruct LLMs to automatically scrape and extract sensitive personal attributes from publicly visible user activities on pseudonymous platforms.<n>We introduce an automated profiling framework called AutoProfiler to assess the feasibility of such threats in real-world scenarios.
arXiv Detail & Related papers (2025-05-18T13:05:17Z) - Augmenting Anonymized Data with AI: Exploring the Feasibility and Limitations of Large Language Models in Data Enrichment [3.459382629188014]
Large Language Models (LLMs) have demonstrated advanced capabilities in both text generation and comprehension.<n>Their application to data archives might facilitate the privatization of sensitive information about the data subjects.<n>This data, if not safeguarded, may bring privacy risks in terms of both disclosure and identification.
arXiv Detail & Related papers (2025-04-03T13:26:59Z) - Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy.
Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models.
This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z) - Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack.
When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model.
Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - Large Language Models are Advanced Anonymizers [2.9373912230684565]
Recent privacy research on large language models (LLMs) has shown that they achieve near-human-level performance at inferring personal data from online texts.<n>Existing text anonymization methods are currently lacking behind regulatory requirements and adversarial threats.<n>We present a new setting for evaluating anonymization in the face of adversarial LLM inferences.
arXiv Detail & Related papers (2024-02-21T14:44:00Z) - Assessing Privacy Risks in Language Models: A Case Study on
Summarization Tasks [65.21536453075275]
We focus on the summarization task and investigate the membership inference (MI) attack.
We exploit text similarity and the model's resistance to document modifications as potential MI signals.
We discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.
arXiv Detail & Related papers (2023-10-20T05:44:39Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Anonymizing Machine Learning Models [0.0]
Anonymized data is exempt from obligations set out in regulations such as the EU General Data Protection Regulation.
We propose a method that is able to achieve better model accuracy by using the knowledge encoded within the trained model.
We also demonstrate that our approach has a similar, and sometimes even better ability to prevent membership attacks as approaches based on differential privacy.
arXiv Detail & Related papers (2020-07-26T09:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.