Zero-Shot Privacy-Aware Text Rewriting via Iterative Tree Search
- URL: http://arxiv.org/abs/2509.20838v1
- Date: Thu, 25 Sep 2025 07:23:52 GMT
- Title: Zero-Shot Privacy-Aware Text Rewriting via Iterative Tree Search
- Authors: Shuo Huang, Xingliang Yuan, Gholamreza Haffari, Lizhen Qu,
- Abstract summary: Large language models (LLMs) in cloud-based services have raised significant privacy concerns.<n>Existing text anonymization and de-identification techniques, such as rule-based redaction and scrubbing, often struggle to balance privacy preservation with text naturalness and utility.<n>We propose a zero-shot, tree-search-based iterative sentence rewriting algorithm that systematically obfuscates or deletes private information while preserving coherence, relevance, and naturalness.
- Score: 60.197239728279534
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The increasing adoption of large language models (LLMs) in cloud-based services has raised significant privacy concerns, as user inputs may inadvertently expose sensitive information. Existing text anonymization and de-identification techniques, such as rule-based redaction and scrubbing, often struggle to balance privacy preservation with text naturalness and utility. In this work, we propose a zero-shot, tree-search-based iterative sentence rewriting algorithm that systematically obfuscates or deletes private information while preserving coherence, relevance, and naturalness. Our method incrementally rewrites privacy-sensitive segments through a structured search guided by a reward model, enabling dynamic exploration of the rewriting space. Experiments on privacy-sensitive datasets show that our approach significantly outperforms existing baselines, achieving a superior balance between privacy protection and utility preservation.
Related papers
- Privacy-Aware In-Context Learning for Large Language Models [12.605629953620495]
Large language models (LLMs) raise privacy concerns due to potential exposure of sensitive information.<n>We introduce a novel private prediction framework for generating high-quality synthetic text with strong privacy guarantees.
arXiv Detail & Related papers (2025-09-17T01:50:32Z) - The Double-edged Sword of LLM-based Data Reconstruction: Understanding and Mitigating Contextual Vulnerability in Word-level Differential Privacy Text Sanitization [53.51921540246166]
We show that Language Large Models (LLMs) can exploit the contextual vulnerability of DP-sanitized texts.<n>Experiments uncover a double-edged sword effect of LLM reconstructions on privacy and utility.<n>We propose recommendations for using data reconstruction as a post-processing step.
arXiv Detail & Related papers (2025-08-26T12:22:45Z) - RL-Finetuned LLMs for Privacy-Preserving Synthetic Rewriting [17.294176570269]
We propose a reinforcement learning framework that fine-tunes a large language model (LLM) using a composite reward function.<n>The privacy reward combines semantic cues with structural patterns derived from a minimum spanning tree (MST) over latent representations.<n> Empirical results show that the proposed method significantly enhances author obfuscation and privacy metrics without degrading semantic quality.
arXiv Detail & Related papers (2025-08-25T04:38:19Z) - Tau-Eval: A Unified Evaluation Framework for Useful and Private Text Anonymization [7.9049991577473735]
We present Tau-Eval, an open-source framework for benchmarking text anonymization methods.<n>A Python library, code, documentation and tutorials are publicly available.
arXiv Detail & Related papers (2025-06-06T10:59:35Z) - A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage [77.83757117924995]
We propose a new framework that evaluates re-identification attacks to quantify individual privacy risks upon data release.<n>Our approach shows that seemingly innocuous auxiliary information can be used to infer sensitive attributes like age or substance use history from sanitized data.
arXiv Detail & Related papers (2025-04-28T01:16:27Z) - Token-Level Privacy in Large Language Models [7.4143291213663955]
We introduce dchi-stencil, a novel token-level privacy-preserving mechanism that integrates contextual and semantic information.<n>By incorporating both semantic and contextual nuances, dchi-stencil achieves a robust balance between privacy and utility.<n>This work highlights the potential of dchi-stencil to set a new standard for privacy-preserving NLP in modern, high-risk applications.
arXiv Detail & Related papers (2025-03-05T16:27:25Z) - NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human [56.46355425175232]
We suggest sanitizing sensitive text using two common strategies used by humans.<n>We curate the first corpus, coined NAP2, through both crowdsourcing and the use of large language models.<n>Compared to the prior works on anonymization, the human-inspired approaches result in more natural rewrites.
arXiv Detail & Related papers (2024-06-06T05:07:44Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Privacy-Preserving Image Features via Adversarial Affine Subspace
Embeddings [72.68801373979943]
Many computer vision systems require users to upload image features to the cloud for processing and storage.
We propose a new privacy-preserving feature representation.
Compared to the original features, our approach makes it significantly more difficult for an adversary to recover private information.
arXiv Detail & Related papers (2020-06-11T17:29:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.