Truthful Text Sanitization Guided by Inference Attacks
- URL: http://arxiv.org/abs/2412.12928v2
- Date: Sun, 31 Aug 2025 09:17:53 GMT
- Title: Truthful Text Sanitization Guided by Inference Attacks
- Authors: Ildikó Pilán, Benet Manzanares-Salor, David Sánchez, Pierre Lison,
- Abstract summary: We introduce a novel text sanitization method based on generalizations that subsume the semantic content of the original text spans.<n>The approach relies on the use of instruction-tuned large language models (LLMs) and is divided into two stages.<n>Results on the Text Anonymization Benchmark show that the proposed approach, implemented with Mistral 7B Instruct, leads to enhanced utility.
- Score: 3.3802914883339557
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text sanitization aims to rewrite parts of a document to prevent disclosure of personal information. The central challenge of text sanitization is to strike a balance between privacy protection (avoiding the leakage of personal information) and utility preservation (retaining as much as possible of the document's original content). To this end, we introduce a novel text sanitization method based on generalizations, that is, broader but still informative terms that subsume the semantic content of the original text spans. The approach relies on the use of instruction-tuned large language models (LLMs) and is divided into two stages. Given a document including text spans expressing personally identifiable information (PII), the LLM is first applied to obtain truth-preserving replacement candidates for each text span and rank those according to their abstraction level. Those candidates are then evaluated for their ability to protect privacy by conducting inference attacks with the LLM. Finally, the system selects the most informative replacement candidate shown to be resistant to those attacks. This two-stage process produces replacements that effectively balance privacy and utility. We also present novel metrics to evaluate these two aspects without needing to manually annotate documents. Results on the Text Anonymization Benchmark show that the proposed approach, implemented with Mistral 7B Instruct, leads to enhanced utility, with only a marginal (< 1 p.p.) increase in re-identification risk compared to fully suppressing the original spans. Furthermore, our approach is shown to be more truth-preserving than existing methods such as Microsoft Presidio's synthetic replacements.
Related papers
- Zero-Shot Privacy-Aware Text Rewriting via Iterative Tree Search [60.197239728279534]
Large language models (LLMs) in cloud-based services have raised significant privacy concerns.<n>Existing text anonymization and de-identification techniques, such as rule-based redaction and scrubbing, often struggle to balance privacy preservation with text naturalness and utility.<n>We propose a zero-shot, tree-search-based iterative sentence rewriting algorithm that systematically obfuscates or deletes private information while preserving coherence, relevance, and naturalness.
arXiv Detail & Related papers (2025-09-25T07:23:52Z) - The Double-edged Sword of LLM-based Data Reconstruction: Understanding and Mitigating Contextual Vulnerability in Word-level Differential Privacy Text Sanitization [53.51921540246166]
We show that Language Large Models (LLMs) can exploit the contextual vulnerability of DP-sanitized texts.<n>Experiments uncover a double-edged sword effect of LLM reconstructions on privacy and utility.<n>We propose recommendations for using data reconstruction as a post-processing step.
arXiv Detail & Related papers (2025-08-26T12:22:45Z) - A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage [77.83757117924995]
We propose a new framework that evaluates re-identification attacks to quantify individual privacy risks upon data release.
Our approach shows that seemingly innocuous auxiliary information can be used to infer sensitive attributes like age or substance use history from sanitized data.
arXiv Detail & Related papers (2025-04-28T01:16:27Z) - ExpShield: Safeguarding Web Text from Unauthorized Crawling and Language Modeling Exploitation [17.71790411163849]
We propose ExpShiled, a proactive self-defense mechanism that mitigates sample-specific memorization via imperceptible text perturbations.<n>Our approach requires no external collaboration while maintaining original readability.<n>Even with privacy backdoors, the Membership Inference Attack (MIA) AUC drops from 0.95 to 0.55, and instance exploitation approaches zero.
arXiv Detail & Related papers (2024-12-30T17:52:02Z) - NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human [55.20137833039499]
We suggest sanitizing sensitive text using two common strategies used by humans.
We curate the first corpus, coined NAP2, through both crowdsourcing and the use of large language models.
arXiv Detail & Related papers (2024-06-06T05:07:44Z) - Just Rewrite It Again: A Post-Processing Method for Enhanced Semantic Similarity and Privacy Preservation of Differentially Private Rewritten Text [3.3916160303055567]
We propose a simple post-processing method based on the goal of aligning rewritten texts with their original counterparts.
Our results show that such an approach not only produces outputs that are more semantically reminiscent of the original inputs, but also texts which score on average better in empirical privacy evaluations.
arXiv Detail & Related papers (2024-05-30T08:41:33Z) - Silencing the Risk, Not the Whistle: A Semi-automated Text Sanitization Tool for Mitigating the Risk of Whistleblower Re-Identification [4.082799056366928]
Whistleblowing is essential for ensuring transparency and accountability in both public and private sectors.
Legal measures, such as the EU WBD, are limited in their scope and effectiveness.
Current text sanitization tools follow a one-size-fits-all approach and take an overly limited view of anonymity.
arXiv Detail & Related papers (2024-05-02T08:52:29Z) - Silent Guardian: Protecting Text from Malicious Exploitation by Large Language Models [63.91178922306669]
We introduce Silent Guardian, a text protection mechanism against large language models (LLMs)
By carefully modifying the text to be protected, TPE can induce LLMs to first sample the end token, thus directly terminating the interaction.
We show that SG can effectively protect the target text under various configurations and achieve almost 100% protection success rate in some cases.
arXiv Detail & Related papers (2023-12-15T10:30:36Z) - Text Sanitization Beyond Specific Domains: Zero-Shot Redaction &
Substitution with Large Language Models [0.0]
We present a zero-shot text sanitization technique that detects and substitutes potentially sensitive information using Large Language Models.
Our evaluation shows that our method excels at protecting privacy while maintaining text coherence and contextual information.
arXiv Detail & Related papers (2023-11-16T18:42:37Z) - Neural Text Sanitization with Privacy Risk Indicators: An Empirical
Analysis [2.9311414545087366]
We consider a two-step approach to text sanitization and provide a detailed analysis of its empirical performance.
The text sanitization process starts with a privacy-oriented entity recognizer that seeks to determine the text spans expressing identifiable personal information.
We present five distinct indicators of the re-identification risk, respectively based on language model probabilities, text span classification, sequence labelling, perturbations, and web search.
arXiv Detail & Related papers (2023-10-22T14:17:27Z) - Verifying the Robustness of Automatic Credibility Assessment [50.55687778699995]
We show that meaning-preserving changes in input text can mislead the models.
We also introduce BODEGA: a benchmark for testing both victim models and attack methods on misinformation detection tasks.
Our experimental results show that modern large language models are often more vulnerable to attacks than previous, smaller solutions.
arXiv Detail & Related papers (2023-03-14T16:11:47Z) - DP-BART for Privatized Text Rewriting under Local Differential Privacy [2.45626162429986]
We propose a new system 'DP-BART' that largely outperforms existing LDP systems.
Our approach uses a novel clipping method, iterative pruning, and further training of internal representations which drastically reduces the amount of noise required for DP guarantees.
arXiv Detail & Related papers (2023-02-15T13:07:34Z) - Text Revision by On-the-Fly Representation Optimization [76.11035270753757]
Current state-of-the-art methods formulate these tasks as sequence-to-sequence learning problems.
We present an iterative in-place editing approach for text revision, which requires no parallel data.
It achieves competitive and even better performance than state-of-the-art supervised methods on text simplification.
arXiv Detail & Related papers (2022-04-15T07:38:08Z) - Semantics-Preserved Distortion for Personal Privacy Protection in Information Management [65.08939490413037]
This paper suggests a linguistically-grounded approach to distort texts while maintaining semantic integrity.
We present two distinct frameworks for semantic-preserving distortion: a generative approach and a substitutive approach.
We also explore privacy protection in a specific medical information management scenario, showing our method effectively limits sensitive data memorization.
arXiv Detail & Related papers (2022-01-04T04:01:05Z) - Tracing Text Provenance via Context-Aware Lexical Substitution [81.49359106648735]
We propose a natural language watermarking scheme based on context-aware lexical substitution.
Under both objective and subjective metrics, our watermarking scheme can well preserve the semantic integrity of original sentences.
arXiv Detail & Related papers (2021-12-15T04:27:33Z) - Semantic-Preserving Adversarial Text Attacks [85.32186121859321]
We propose a Bigram and Unigram based adaptive Semantic Preservation Optimization (BU-SPO) method to examine the vulnerability of deep models.
Our method achieves the highest attack success rates and semantics rates by changing the smallest number of words compared with existing methods.
arXiv Detail & Related papers (2021-08-23T09:05:18Z) - Differential Privacy for Text Analytics via Natural Text Sanitization [44.95170585853761]
This paper takes a direct approach to text sanitization. Our insight is to consider both sensitivity and similarity via our new local DP notion.
The sanitized texts also contribute to our sanitization-aware pretraining and fine-tuning, enabling privacy-preserving natural language processing over the BERT language model with promising utility.
arXiv Detail & Related papers (2021-06-02T15:15:10Z) - Adversarial Watermarking Transformer: Towards Tracing Text Provenance
with Data Hiding [80.3811072650087]
We study natural language watermarking as a defense to help better mark and trace the provenance of text.
We introduce the Adversarial Watermarking Transformer (AWT) with a jointly trained encoder-decoder and adversarial training.
AWT is the first end-to-end model to hide data in text by automatically learning -- without ground truth -- word substitutions along with their locations.
arXiv Detail & Related papers (2020-09-07T11:01:24Z) - Learning to Select Bi-Aspect Information for Document-Scale Text Content
Manipulation [50.01708049531156]
We focus on a new practical task, document-scale text content manipulation, which is the opposite of text style transfer.
In detail, the input is a set of structured records and a reference text for describing another recordset.
The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference.
arXiv Detail & Related papers (2020-02-24T12:52:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.