TextCrafter: Optimization-Calibrated Noise for Defending Against Text Embedding Inversion
- URL: http://arxiv.org/abs/2509.17302v4
- Date: Sat, 01 Nov 2025 02:21:49 GMT
- Title: TextCrafter: Optimization-Calibrated Noise for Defending Against Text Embedding Inversion
- Authors: Duoxun Tang, Xinhang Jiang, Jiajun Niu,
- Abstract summary: Text embedding inversion attacks reconstruct original sentences from latent representations, posing severe privacy threats in collaborative inference and edge computing.<n>We propose TextCrafter, an optimization-based adversarial perturbation mechanism that combines RL learned, geometry aware noise injection, user embeddings with cluster priors and PII signal guidance to suppress inversion while preserving task utility.
- Score: 0.13764085113103217
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Text embedding inversion attacks reconstruct original sentences from latent representations, posing severe privacy threats in collaborative inference and edge computing. We propose TextCrafter, an optimization-based adversarial perturbation mechanism that combines RL learned, geometry aware noise injection orthogonal to user embeddings with cluster priors and PII signal guidance to suppress inversion while preserving task utility. Unlike prior defenses either non learnable or agnostic to perturbation direction, TextCrafter provides a directional protective policy that balances privacy and utility. Under strong privacy setting, TextCrafter maintains 70 percentage classification accuracy on four datasets and consistently outperforms Gaussian/LDP baselines across lower privacy budgets, demonstrating a superior privacy utility trade off.
Related papers
- Stop Tracking Me! Proactive Defense Against Attribute Inference Attack in LLMs [61.15237978606501]
Large language models can infer private user attributes from user-generated text.<n>Existing anonymization-based defenses are coarse-grained, lacking word-level precision in anonymizing privacy-leaking elements.<n>We propose a unified defense framework that combines fine-grained anonymization (TRACE) with inference-preventing optimization (RPS)
arXiv Detail & Related papers (2026-02-12T03:37:50Z) - Concept-Aware Privacy Mechanisms for Defending Embedding Inversion Attacks [7.044221462918645]
Existing differential privacy defenses assume uniform sensitivity across embedding dimensions, leading to excessive noise and degraded utility.<n>We propose SPARSE, a user-centric framework for concept-specific privacy protection in text embeddings.
arXiv Detail & Related papers (2026-02-06T09:28:55Z) - PRISM: Privacy-Aware Routing for Adaptive Cloud-Edge LLM Inference via Semantic Sketch Collaboration [8.776463501718737]
We propose a context-aware framework that dynamically balances privacy and inference quality.<n>PRISM executes in four stages: (1) the edge device profiles entity-level sensitivity; (2) a soft gating module on the edge selects an execution mode - cloud, edge, or collaboration; (3) for collaborative paths, the edge applies adaptive two-layer local differential privacy based on entity risks; and (4) the cloud LLM generates a semantic sketch from the perturbed prompt.
arXiv Detail & Related papers (2025-11-27T22:32:33Z) - Secret-Protected Evolution for Differentially Private Synthetic Text Generation [25.60898496792365]
Secret-Protected Evolution (SecPE) is a novel framework that extends private evolution with secret-aware protection.<n>We show that SecPE satisfies $(mathrmp, mathrmr)$-secret protection, constituting a relaxation of Gaussian DP that enables tighter utility-privacy trade-offs.<n>Our results highlight that SecPE can unlock more practical and effective privacy-preserving synthetic text generation.
arXiv Detail & Related papers (2025-10-13T04:05:42Z) - Zero-Shot Privacy-Aware Text Rewriting via Iterative Tree Search [60.197239728279534]
Large language models (LLMs) in cloud-based services have raised significant privacy concerns.<n>Existing text anonymization and de-identification techniques, such as rule-based redaction and scrubbing, often struggle to balance privacy preservation with text naturalness and utility.<n>We propose a zero-shot, tree-search-based iterative sentence rewriting algorithm that systematically obfuscates or deletes private information while preserving coherence, relevance, and naturalness.
arXiv Detail & Related papers (2025-09-25T07:23:52Z) - Privacy-Aware In-Context Learning for Large Language Models [12.605629953620495]
Large language models (LLMs) raise privacy concerns due to potential exposure of sensitive information.<n>We introduce a novel private prediction framework for generating high-quality synthetic text with strong privacy guarantees.
arXiv Detail & Related papers (2025-09-17T01:50:32Z) - The Double-edged Sword of LLM-based Data Reconstruction: Understanding and Mitigating Contextual Vulnerability in Word-level Differential Privacy Text Sanitization [53.51921540246166]
We show that Language Large Models (LLMs) can exploit the contextual vulnerability of DP-sanitized texts.<n>Experiments uncover a double-edged sword effect of LLM reconstructions on privacy and utility.<n>We propose recommendations for using data reconstruction as a post-processing step.
arXiv Detail & Related papers (2025-08-26T12:22:45Z) - Privacy-Aware Decoding: Mitigating Privacy Leakage of Large Language Models in Retrieval-Augmented Generation [26.573578326262307]
Privacy-Aware Decoding (PAD) is a lightweight, inference-time defense that adaptively injects calibrated Gaussian noise into token logits during generation.<n>PAD integrates confidence-based screening to selectively protect high-risk tokens, efficient sensitivity estimation to minimize unnecessary noise, and context-aware noise calibration to balance privacy with generation quality.<n>Our work takes an important step toward mitigating privacy risks in RAG via decoding strategies, paving the way for universal and scalable privacy solutions in sensitive domains.
arXiv Detail & Related papers (2025-08-05T05:22:13Z) - Enhancing Privacy in Semantic Communication over Wiretap Channels leveraging Differential Privacy [51.028047763426265]
Semantic communication (SemCom) improves transmission efficiency by focusing on task-relevant information.<n> transmitting semantic-rich data over insecure channels introduces privacy risks.<n>This paper proposes a novel SemCom framework that integrates differential privacy mechanisms to protect sensitive semantic features.
arXiv Detail & Related papers (2025-04-23T08:42:44Z) - PersGuard: Preventing Malicious Personalization via Backdoor Attacks on Pre-trained Text-to-Image Diffusion Models [51.458089902581456]
We introduce PersGuard, a novel backdoor-based approach that prevents malicious personalization of specific images.<n>Our method significantly outperforms existing techniques, offering a more robust solution for privacy and copyright protection.
arXiv Detail & Related papers (2025-02-22T09:47:55Z) - NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human [56.46355425175232]
We suggest sanitizing sensitive text using two common strategies used by humans.<n>We curate the first corpus, coined NAP2, through both crowdsourcing and the use of large language models.<n>Compared to the prior works on anonymization, the human-inspired approaches result in more natural rewrites.
arXiv Detail & Related papers (2024-06-06T05:07:44Z) - Over-the-Air Federated Learning with Privacy Protection via Correlated
Additive Perturbations [57.20885629270732]
We consider privacy aspects of wireless federated learning with Over-the-Air (OtA) transmission of gradient updates from multiple users/agents to an edge server.
Traditional perturbation-based methods provide privacy protection while sacrificing the training accuracy.
In this work, we aim at minimizing privacy leakage to the adversary and the degradation of model accuracy at the edge server.
arXiv Detail & Related papers (2022-10-05T13:13:35Z) - ADePT: Auto-encoder based Differentially Private Text Transformation [22.068984615657463]
We provide a utility-preserving differentially private text transformation algorithm using auto-encoders.
Our algorithm transforms text to offer robustness against attacks and produces transformations with high semantic quality.
Our results show that the proposed model performs better against MIA attacks while offering lower to no degradation in the utility of the underlying transformation process.
arXiv Detail & Related papers (2021-01-29T23:15:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.