Related papers: IncogniText: Privacy-enhancing Conditional Text Anonymization via LLM-based Private Attribute Randomization

IncogniText: Privacy-enhancing Conditional Text Anonymization via LLM-based Private Attribute Randomization

URL: http://arxiv.org/abs/2407.02956v2
Date: Sun, 02 Feb 2025 16:51:13 GMT
Title: IncogniText: Privacy-enhancing Conditional Text Anonymization via LLM-based Private Attribute Randomization
Authors: Ahmed Frikha, Nassim Walha, Krishna Kanth Nakka, Ricardo Mendes, Xue Jiang, Xuebing Zhou,
Abstract summary: We propose IncogniText, a technique that anonymizes the text to mislead a potential adversary into predicting a wrong private attribute value.<n>Our empirical evaluation shows a reduction of private attribute leakage by more than 90% across 8 different private attributes.
Score: 8.483679748399037
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this work, we address the problem of text anonymization where the goal is to prevent adversaries from correctly inferring private attributes of the author, while keeping the text utility, i.e., meaning and semantics. We propose IncogniText, a technique that anonymizes the text to mislead a potential adversary into predicting a wrong private attribute value. Our empirical evaluation shows a reduction of private attribute leakage by more than 90% across 8 different private attributes. Finally, we demonstrate the maturity of IncogniText for real-world applications by distilling its anonymization capability into a set of LoRA parameters associated with an on-device model. Our results show the possibility of reducing privacy leakage by more than half with limited impact on utility.

Related papers

A False Sense of Privacy: Evaluating Textual Data Sanitization Beyond Surface-level Privacy Leakage [77.83757117924995]
We propose a new framework that evaluates re-identification attacks to quantify individual privacy risks upon data release. Our approach shows that seemingly innocuous auxiliary information can be used to infer sensitive attributes like age or substance use history from sanitized data.
arXiv Detail & Related papers (2025-04-28T01:16:27Z)
Investigating User Perspectives on Differentially Private Text Privatization [81.59631769859004]
This work investigates how factors of $textitscenario$, $textitdata sensitivity$, $textitmechanism type$, and $textitreason for data collection$ impact user preferences for text privatization. We learn that while all these factors play a role in influencing privacy decisions, users are highly sensitive to the utility and coherence of the private output texts.
arXiv Detail & Related papers (2025-03-12T12:33:20Z)
Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy. Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models. This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z)
NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human [55.20137833039499]
We suggest sanitizing sensitive text using two common strategies used by humans. We curate the first corpus, coined NAP2, through both crowdsourcing and the use of large language models.
arXiv Detail & Related papers (2024-06-06T05:07:44Z)
Keep It Private: Unsupervised Privatization of Online Text [13.381890596224867]
We introduce an automatic text privatization framework that fine-tunes a large language model via reinforcement learning to produce rewrites that balance soundness, sense, and privacy. We evaluate it extensively on a large-scale test set of English Reddit posts by 68k authors composed of short-medium length texts.
arXiv Detail & Related papers (2024-05-16T17:12:18Z)
Large Language Models are Advanced Anonymizers [13.900633576526863]
We show how adversarial anonymization outperforms current industry-grade anonymizers in terms of the resulting utility and privacy. We first present a new setting for evaluating anonymizations in the face of adversarial LLMs inferences.
arXiv Detail & Related papers (2024-02-21T14:44:00Z)
Silent Guardian: Protecting Text from Malicious Exploitation by Large Language Models [63.91178922306669]
We introduce Silent Guardian, a text protection mechanism against large language models (LLMs) By carefully modifying the text to be protected, TPE can induce LLMs to first sample the end token, thus directly terminating the interaction. We show that SG can effectively protect the target text under various configurations and achieve almost 100% protection success rate in some cases.
arXiv Detail & Related papers (2023-12-15T10:30:36Z)
PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind) Our work offers a theoretical analysis for model design and benchmarks various techniques. In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z)
Adversary for Social Good: Leveraging Adversarial Attacks to Protect Personal Attribute Privacy [14.395031313422214]
We leverage the inherent vulnerability of machine learning to adversarial attacks, and design a novel text-space Adversarial attack for Social Good, called Adv4SG. Our method can effectively degrade the inference accuracy with less computational cost over different attribute settings, which substantially helps mitigate the impacts of inference attacks and thus achieve high performance in user attribute privacy protection.
arXiv Detail & Related papers (2023-06-04T21:40:23Z)
How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss. We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z)
Smooth Anonymity for Sparse Graphs [69.1048938123063]
differential privacy has emerged as the gold standard of privacy, however, when it comes to sharing sparse datasets. In this work, we consider a variation of $k$-anonymity, which we call smooth-$k$-anonymity, and design simple large-scale algorithms that efficiently provide smooth-$k$-anonymity.
arXiv Detail & Related papers (2022-07-13T17:09:25Z)
The Text Anonymization Benchmark (TAB): A Dedicated Corpus and Evaluation Framework for Text Anonymization [2.9849405664643585]
We present a novel benchmark and associated evaluation metrics for assessing the performance of text anonymization methods. Text anonymization, defined as the task of editing a text document to prevent the disclosure of personal information, currently suffers from a shortage of privacy-oriented annotated text resources. This paper presents TAB (Text Anonymization Benchmark), a new, open-source annotated corpus developed to address this shortage.
arXiv Detail & Related papers (2022-01-25T14:34:42Z)
Protecting Anonymous Speech: A Generative Adversarial Network Methodology for Removing Stylistic Indicators in Text [2.9005223064604078]
We develop a new approach to authorship anonymization by constructing a generative adversarial network. Our fully automatic method achieves comparable results to other methods in terms of content preservation and fluency. Our approach is able to generalize well to an open-set context and anonymize sentences from authors it has not encountered before.
arXiv Detail & Related papers (2021-10-18T17:45:56Z)
When differential privacy meets NLP: The devil is in the detail [3.5503507997334958]
We present a formal analysis of ADePT, a differentially private auto-encoder for text rewriting. Our proof reveals that ADePT is not differentially private, thus rendering the experimental results unsubstantiated.
arXiv Detail & Related papers (2021-09-07T16:12:25Z)
No Intruder, no Validity: Evaluation Criteria for Privacy-Preserving Text Anonymization [0.48733623015338234]
We argue that researchers and practitioners developing automated text anonymization systems should carefully assess whether their evaluation methods truly reflect the system's ability to protect individuals from being re-identified. We propose TILD, a set of evaluation criteria that comprises an anonymization method's technical performance, the information loss resulting from its anonymization, and the human ability to de-anonymize redacted documents.
arXiv Detail & Related papers (2021-03-16T18:18:29Z)
InfoScrub: Towards Attribute Privacy by Targeted Obfuscation [77.49428268918703]
We study techniques that allow individuals to limit the private information leaked in visual data. We tackle this problem in a novel image obfuscation framework. We find our approach generates obfuscated images faithful to the original input images, and additionally increase uncertainty by 6.2$times$ (or up to 0.85 bits) over the non-obfuscated counterparts.
arXiv Detail & Related papers (2020-05-20T19:48:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.