Related papers: Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs

Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs

URL: http://arxiv.org/abs/2602.11528v1
Date: Thu, 12 Feb 2026 03:37:50 GMT
Title: Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs
Authors: Dong Yan, Jian Liang, Ran He, Tieniu Tan,
Abstract summary: Large language models can infer private user attributes from user-generated text.<n>Existing anonymization-based defenses are coarse-grained, lacking word-level precision in anonymizing privacy-leaking elements.<n>We propose a unified defense framework that combines fine-grained anonymization (TRACE) with inference-preventing optimization (RPS)
Score: 61.15237978606501
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent studies have shown that large language models (LLMs) can infer private user attributes (e.g., age, location, gender) from user-generated text shared online, enabling rapid and large-scale privacy breaches. Existing anonymization-based defenses are coarse-grained, lacking word-level precision in anonymizing privacy-leaking elements. Moreover, they are inherently limited as altering user text to hide sensitive cues still allows attribute inference to occur through models' reasoning capabilities. To address these limitations, we propose a unified defense framework that combines fine-grained anonymization (TRACE) with inference-preventing optimization (RPS). TRACE leverages attention mechanisms and inference chain generation to identify and anonymize privacy-leaking textual elements, while RPS employs a lightweight two-stage optimization strategy to induce model rejection behaviors, thereby preventing attribute inference. Evaluations across diverse LLMs show that TRACE-RPS reduces attribute inference accuracy from around 50\% to below 5\% on open-source models. In addition, our approach offers strong cross-model generalization, prompt-variation robustness, and utility-privacy tradeoffs. Our code is available at https://github.com/Jasper-Yan/TRACE-RPS.

Related papers

Semantically-Aware LLM Agent to Enhance Privacy in Conversational AI Services [0.0]
We present a semantically-aware privacy agent designed to safeguard sensitive PII data when using remote Large Language Models (LLMs)<n>Unlike prior work that often degrade response quality, our approach dynamically replaces sensitive PII entities in user prompts with semantically consistent pseudonyms.<n>Our results show that LOPSIDED reduces semantic utility errors by a factor of 5 compared to baseline techniques.
arXiv Detail & Related papers (2025-10-30T21:34:23Z)
VoxGuard: Evaluating User and Attribute Privacy in Speech via Membership Inference Attacks [51.68795949691009]
We introduce VoxGuard, a framework grounded in differential privacy and membership inference.<n>For attributes, we show that simple transparent attacks recover gender and accent with near-perfect accuracy even after anonymization.<n>Our results demonstrate that EER substantially underestimates leakage, highlighting the need for low-FPR evaluation.
arXiv Detail & Related papers (2025-09-22T20:57:48Z)
RL-Finetuned LLMs for Privacy-Preserving Synthetic Rewriting [17.294176570269]
We propose a reinforcement learning framework that fine-tunes a large language model (LLM) using a composite reward function.<n>The privacy reward combines semantic cues with structural patterns derived from a minimum spanning tree (MST) over latent representations.<n> Empirical results show that the proposed method significantly enhances author obfuscation and privacy metrics without degrading semantic quality.
arXiv Detail & Related papers (2025-08-25T04:38:19Z)
Privacy-Aware Decoding: Mitigating Privacy Leakage of Large Language Models in Retrieval-Augmented Generation [26.573578326262307]
Privacy-Aware Decoding (PAD) is a lightweight, inference-time defense that adaptively injects calibrated Gaussian noise into token logits during generation.<n>PAD integrates confidence-based screening to selectively protect high-risk tokens, efficient sensitivity estimation to minimize unnecessary noise, and context-aware noise calibration to balance privacy with generation quality.<n>Our work takes an important step toward mitigating privacy risks in RAG via decoding strategies, paving the way for universal and scalable privacy solutions in sensitive domains.
arXiv Detail & Related papers (2025-08-05T05:22:13Z)
AgentStealth: Reinforcing Large Language Model for Anonymizing User-generated Text [8.758843436588297]
AgentStealth is a self-reinforcing language model for text anonymization.<n>We show that our method outperforms baselines in both anonymization effectiveness and utility.<n>Our lightweight design supports direct deployment on edge devices, avoiding cloud reliance and communication-based privacy risks.
arXiv Detail & Related papers (2025-06-26T02:48:16Z)
Machine Learning with Privacy for Protected Attributes [56.44253915927481]
We refine the definition of differential privacy (DP) to create a more general and flexible framework that we call feature differential privacy (FDP)<n>Our definition is simulation-based and allows for both addition/removal and replacement variants of privacy, and can handle arbitrary separation of protected and non-protected features.<n>We apply our framework to various machine learning tasks and show that it can significantly improve the utility of DP-trained models when public features are available.
arXiv Detail & Related papers (2025-06-24T17:53:28Z)
Self-Refining Language Model Anonymizers via Adversarial Distillation [48.280759014096354]
We introduce SElf-refining Anonymization with Language model (SEAL)<n>SEAL is a novel distillation framework for training small language models (SLMs) to perform effective anonymization without relying on external models at inference time.<n>Experiments on SynthPAI, a dataset of synthetic personal profiles and text comments, demonstrate that SLMs trained with SEAL achieve substantial improvements in anonymization capabilities.
arXiv Detail & Related papers (2025-06-02T08:21:27Z)
Defending against Indirect Prompt Injection by Instruction Detection [109.30156975159561]
InstructDetector is a novel detection-based approach that leverages the behavioral states of LLMs to identify potential IPI attacks.<n>InstructDetector achieves a detection accuracy of 99.60% in the in-domain setting and 96.90% in the out-of-domain setting, and reduces the attack success rate to just 0.03% on the BIPIA benchmark.
arXiv Detail & Related papers (2025-05-08T13:04:45Z)
PersGuard: Preventing Malicious Personalization via Backdoor Attacks on Pre-trained Text-to-Image Diffusion Models [51.458089902581456]
We introduce PersGuard, a novel backdoor-based approach that prevents malicious personalization of specific images.<n>Our method significantly outperforms existing techniques, offering a more robust solution for privacy and copyright protection.
arXiv Detail & Related papers (2025-02-22T09:47:55Z)
Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Anonymizing text that contains sensitive information is crucial for a wide range of applications.<n>Existing techniques face the emerging challenges of the re-identification ability of large language models.<n>We propose a framework composed of three key components: a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z)
IncogniText: Privacy-enhancing Conditional Text Anonymization via LLM-based Private Attribute Randomization [8.483679748399037]
We propose IncogniText, a technique that anonymizes the text to mislead a potential adversary into predicting a wrong private attribute value.<n>Our empirical evaluation shows a reduction of private attribute leakage by more than 90% across 8 different private attributes.
arXiv Detail & Related papers (2024-07-03T09:49:03Z)
CrowdGuard: Federated Backdoor Detection in Federated Learning [39.58317527488534]
This paper presents a novel defense mechanism, CrowdGuard, that effectively mitigates backdoor attacks in Federated Learning. CrowdGuard employs a server-located stacked clustering scheme to enhance its resilience to rogue client feedback. The evaluation results demonstrate that CrowdGuard achieves a 100% True-Positive-Rate and True-Negative-Rate across various scenarios.
arXiv Detail & Related papers (2022-10-14T11:27:49Z)
Just Fine-tune Twice: Selective Differential Privacy for Large Language Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models. Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.