Look Twice before You Leap: A Rational Agent Framework for Localized Adversarial Anonymization
- URL: http://arxiv.org/abs/2512.06713v1
- Date: Sun, 07 Dec 2025 08:03:43 GMT
- Title: Look Twice before You Leap: A Rational Agent Framework for Localized Adversarial Anonymization
- Authors: Donghang Duan, Xu Zheng,
- Abstract summary: We propose Rational Localized Adversarial Anonymization (RLAA)<n> RLAA is a fully localized and training-free framework featuring an Attacker-Arbitrator-Anonymizer architecture.<n>Our code and datasets will be released upon acceptance.
- Score: 5.308328605042682
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current LLM-based text anonymization frameworks usually rely on remote API services from powerful LLMs, which creates an inherent "privacy paradox": users must somehow disclose data to untrusted third parties for superior privacy preservation. Moreover, directly migrating these frameworks to local small-scale models (LSMs) offers a suboptimal solution with catastrophic collapse in utility based on our core findings. Our work argues that this failure stems not merely from the capability deficits of LSMs, but from the inherent irrationality of the greedy adversarial strategies employed by current state-of-the-art (SoTA) methods. We model the anonymization process as a trade-off between Marginal Privacy Gain (MPG) and Marginal Utility Cost (MUC), and demonstrate that greedy strategies inevitably drift into an irrational state. To address this, we propose Rational Localized Adversarial Anonymization (RLAA), a fully localized and training-free framework featuring an Attacker-Arbitrator-Anonymizer (A-A-A) architecture. RLAA introduces an arbitrator that acts as a rationality gatekeeper, validating the attacker's inference to filter out feedback providing negligible benefits on privacy preservation. This mechanism enforces a rational early-stopping criterion, and systematically prevents utility collapse. Extensive experiments on different datasets demonstrate that RLAA achieves the best privacy-utility trade-off, and in some cases even outperforms SoTA on the Pareto principle. Our code and datasets will be released upon acceptance.
Related papers
- SRFed: Mitigating Poisoning Attacks in Privacy-Preserving Federated Learning with Heterogeneous Data [5.7335377562335275]
Federated Learning (FL) enables collaborative model training without exposing clients' private data, and has been widely adopted in privacy-sensitive scenarios.<n>It faces two critical security threats: curious servers that may launch inference attacks to reconstruct clients' private data, and compromised clients that can launch poisoning attacks to disrupt model aggregation.<n>We propose SRFed, an efficient Byzantine-robust and privacy-preserving FL framework for Non-IID scenarios.
arXiv Detail & Related papers (2026-02-18T14:14:38Z) - Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs [61.15237978606501]
Large language models can infer private user attributes from user-generated text.<n>Existing anonymization-based defenses are coarse-grained, lacking word-level precision in anonymizing privacy-leaking elements.<n>We propose a unified defense framework that combines fine-grained anonymization (TRACE) with inference-preventing optimization (RPS)
arXiv Detail & Related papers (2026-02-12T03:37:50Z) - Fine-Grained Privacy Extraction from Retrieval-Augmented Generation Systems via Knowledge Asymmetry Exploitation [15.985529058573912]
Retrieval-augmented generation (RAG) systems enhance large language models (LLMs) by integrating external knowledge bases.<n>Existing privacy attacks on RAG systems can trigger data leakage but often fail to accurately isolate knowledge-base-derived sentences within mixed responses.<n>This paper presents a novel black-box attack framework that exploits knowledge asymmetry between RAG and standard LLMs to achieve fine-grained privacy extraction.
arXiv Detail & Related papers (2025-07-31T03:50:16Z) - MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models [56.09354775405601]
Model extraction attacks aim to replicate the functionality of a black-box model through query access.<n>Most existing defenses presume that attacker queries have out-of-distribution (OOD) samples, enabling them to detect and disrupt suspicious inputs.<n>We propose MISLEADER, a novel defense strategy that does not rely on OOD assumptions.
arXiv Detail & Related papers (2025-06-03T01:37:09Z) - Safeguarding Privacy of Retrieval Data against Membership Inference Attacks: Is This Query Too Close to Home? [14.147748220718784]
We introduce a novel similarity-based MIA detection framework designed for the RAG system.<n>We show that a simple detect-and-hide strategy can successfully obfuscate attackers, maintain data utility, and remain system-agnostic against MIA.
arXiv Detail & Related papers (2025-05-28T07:35:07Z) - Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs [83.11815479874447]
We propose a novel jailbreak attack framework, inspired by cognitive decomposition and biases in human cognition.<n>We employ cognitive decomposition to reduce the complexity of malicious prompts and relevance bias to reorganize prompts.<n>We also introduce a ranking-based harmfulness evaluation metric that surpasses the traditional binary success-or-failure paradigm.
arXiv Detail & Related papers (2025-05-03T05:28:11Z) - Revisiting Locally Differentially Private Protocols: Towards Better Trade-offs in Privacy, Utility, and Attack Resistance [4.5282933786221395]
Local Differential Privacy (LDP) offers strong privacy protection, especially in settings in which the server collecting the data is untrusted.<n>We introduce a general multi-objective optimization framework for refining LDP protocols.<n>Our framework enables modular and context-aware deployment of LDP mechanisms with tunable privacy-utility trade-offs.
arXiv Detail & Related papers (2025-03-03T12:41:01Z) - Bayes-Nash Generative Privacy Against Membership Inference Attacks [24.330984323956173]
We propose a game-theoretic framework modeling privacy protection as a Bayesian game between defender and attacker.<n>To address strategic complexity, we represent the defender's mixed strategy as a neural network generator mapping private datasets to public representations.<n>Our approach significantly outperforms state-of-the-art methods by generating stronger attacks and achieving better privacy-utility tradeoffs.
arXiv Detail & Related papers (2024-10-09T20:29:04Z) - Jailbreaking as a Reward Misspecification Problem [80.52431374743998]
We propose a novel perspective that attributes this vulnerability to reward misspecification during the alignment process.<n>We introduce a metric ReGap to quantify the extent of reward misspecification and demonstrate its effectiveness.<n>We present ReMiss, a system for automated red teaming that generates adversarial prompts in a reward-misspecified space.
arXiv Detail & Related papers (2024-06-20T15:12:27Z) - From Mean to Extreme: Formal Differential Privacy Bounds on the Success of Real-World Data Reconstruction Attacks [54.25638567385662]
Differential Privacy in machine learning is often interpreted as guarantees against membership inference.<n> translating DP budgets into quantitative protection against the more damaging threat of data reconstruction remains a challenging open problem.<n>This paper bridges the critical gap by deriving the first formal privacy bounds tailored to the mechanics of demonstrated "from-scratch" attacks.
arXiv Detail & Related papers (2024-02-20T09:52:30Z) - The Inadequacy of Similarity-based Privacy Metrics: Privacy Attacks against "Truly Anonymous" Synthetic Datasets [12.730435519914415]
We examine the privacy metrics used in real-world synthetic data deployments and demonstrate their unreliability in several ways.<n>We introduce ReconSyn, a reconstruction attack that generates multiple synthetic datasets that are considered private by the metrics but actually leak unique information to individual records.<n>We show that ReconSyn recovers 78-100% of the outliers in the train data with only black-box access to a single fitted generative model and the privacy metrics.
arXiv Detail & Related papers (2023-12-08T15:42:28Z) - Is Vertical Logistic Regression Privacy-Preserving? A Comprehensive
Privacy Analysis and Beyond [57.10914865054868]
We consider vertical logistic regression (VLR) trained with mini-batch descent gradient.
We provide a comprehensive and rigorous privacy analysis of VLR in a class of open-source Federated Learning frameworks.
arXiv Detail & Related papers (2022-07-19T05:47:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.