Related papers: Zero-Shot Privacy-Aware Text Rewriting via Iterative Tree Search

Zero-Shot Privacy-Aware Text Rewriting via Iterative Tree Search

URL: http://arxiv.org/abs/2509.20838v1
Date: Thu, 25 Sep 2025 07:23:52 GMT
Title: Zero-Shot Privacy-Aware Text Rewriting via Iterative Tree Search
Authors: Shuo Huang, Xingliang Yuan, Gholamreza Haffari, Lizhen Qu,
Abstract summary: Large language models (LLMs) in cloud-based services have raised significant privacy concerns.<n>Existing text anonymization and de-identification techniques, such as rule-based redaction and scrubbing, often struggle to balance privacy preservation with text naturalness and utility.<n>We propose a zero-shot, tree-search-based iterative sentence rewriting algorithm that systematically obfuscates or deletes private information while preserving coherence, relevance, and naturalness.
Score: 60.197239728279534
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The increasing adoption of large language models (LLMs) in cloud-based services has raised significant privacy concerns, as user inputs may inadvertently expose sensitive information. Existing text anonymization and de-identification techniques, such as rule-based redaction and scrubbing, often struggle to balance privacy preservation with text naturalness and utility. In this work, we propose a zero-shot, tree-search-based iterative sentence rewriting algorithm that systematically obfuscates or deletes private information while preserving coherence, relevance, and naturalness. Our method incrementally rewrites privacy-sensitive segments through a structured search guided by a reward model, enabling dynamic exploration of the rewriting space. Experiments on privacy-sensitive datasets show that our approach significantly outperforms existing baselines, achieving a superior balance between privacy protection and utility preservation.

Related papers

Privacy-Aware In-Context Learning for Large Language Models [12.605629953620495]
Large language models (LLMs) raise privacy concerns due to potential exposure of sensitive information.<n>We introduce a novel private prediction framework for generating high-quality synthetic text with strong privacy guarantees.
arXiv Detail & Related papers (2025-09-17T01:50:32Z)
The Double-edged Sword of LLM-based Data Reconstruction: Understanding and Mitigating Contextual Vulnerability in Word-level Differential Privacy Text Sanitization [53.51921540246166]
We show that Language Large Models (LLMs) can exploit the contextual vulnerability of DP-sanitized texts.<n>Experiments uncover a double-edged sword effect of LLM reconstructions on privacy and utility.<n>We propose recommendations for using data reconstruction as a post-processing step.
arXiv Detail & Related papers (2025-08-26T12:22:45Z)
RL-Finetuned LLMs for Privacy-Preserving Synthetic Rewriting [17.294176570269]
We propose a reinforcement learning framework that fine-tunes a large language model (LLM) using a composite reward function.<n>The privacy reward combines semantic cues with structural patterns derived from a minimum spanning tree (MST) over latent representations.<n> Empirical results show that the proposed method significantly enhances author obfuscation and privacy metrics without degrading semantic quality.
arXiv Detail & Related papers (2025-08-25T04:38:19Z)
Tau-Eval: A Unified Evaluation Framework for Useful and Private Text Anonymization [7.9049991577473735]
We present Tau-Eval, an open-source framework for benchmarking text anonymization methods.<n>A Python library, code, documentation and tutorials are publicly available.
arXiv Detail & Related papers (2025-06-06T10:59:35Z)
A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage [77.83757117924995]
We propose a new framework that evaluates re-identification attacks to quantify individual privacy risks upon data release.<n>Our approach shows that seemingly innocuous auxiliary information can be used to infer sensitive attributes like age or substance use history from sanitized data.
arXiv Detail & Related papers (2025-04-28T01:16:27Z)
Token-Level Privacy in Large Language Models [7.4143291213663955]
We introduce dchi-stencil, a novel token-level privacy-preserving mechanism that integrates contextual and semantic information.<n>By incorporating both semantic and contextual nuances, dchi-stencil achieves a robust balance between privacy and utility.<n>This work highlights the potential of dchi-stencil to set a new standard for privacy-preserving NLP in modern, high-risk applications.
arXiv Detail & Related papers (2025-03-05T16:27:25Z)
NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human [56.46355425175232]
We suggest sanitizing sensitive text using two common strategies used by humans.<n>We curate the first corpus, coined NAP2, through both crowdsourcing and the use of large language models.<n>Compared to the prior works on anonymization, the human-inspired approaches result in more natural rewrites.
arXiv Detail & Related papers (2024-06-06T05:07:44Z)
PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind) Our work offers a theoretical analysis for model design and benchmarks various techniques. In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z)
Privacy-Preserving Image Features via Adversarial Affine Subspace Embeddings [72.68801373979943]
Many computer vision systems require users to upload image features to the cloud for processing and storage. We propose a new privacy-preserving feature representation. Compared to the original features, our approach makes it significantly more difficult for an adversary to recover private information.
arXiv Detail & Related papers (2020-06-11T17:29:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.