Leveraging Semantic Triples for Private Document Generation with Local Differential Privacy Guarantees
- URL: http://arxiv.org/abs/2508.20736v1
- Date: Thu, 28 Aug 2025 12:59:01 GMT
- Title: Leveraging Semantic Triples for Private Document Generation with Local Differential Privacy Guarantees
- Authors: Stephen Meisenbacher, Maulik Chevli, Florian Matthes,
- Abstract summary: We introduce DP-ST, which leverages semantic triples for neighborhood-aware private document generation under local DP guarantees.<n>Our method allows for coherent text generation even at lower $varepsilon$ values, while still balancing privacy and utility.
- Score: 18.487751624471777
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many works at the intersection of Differential Privacy (DP) in Natural Language Processing aim to protect privacy by transforming texts under DP guarantees. This can be performed in a variety of ways, from word perturbations to full document rewriting, and most often under local DP. Here, an input text must be made indistinguishable from any other potential text, within some bound governed by the privacy parameter $\varepsilon$. Such a guarantee is quite demanding, and recent works show that privatizing texts under local DP can only be done reasonably under very high $\varepsilon$ values. Addressing this challenge, we introduce DP-ST, which leverages semantic triples for neighborhood-aware private document generation under local DP guarantees. Through the evaluation of our method, we demonstrate the effectiveness of the divide-and-conquer paradigm, particularly when limiting the DP notion (and privacy guarantees) to that of a privatization neighborhood. When combined with LLM post-processing, our method allows for coherent text generation even at lower $\varepsilon$ values, while still balancing privacy and utility. These findings highlight the importance of coherence in achieving balanced privatization outputs at reasonable $\varepsilon$ levels.
Related papers
- The Double-edged Sword of LLM-based Data Reconstruction: Understanding and Mitigating Contextual Vulnerability in Word-level Differential Privacy Text Sanitization [53.51921540246166]
We show that Language Large Models (LLMs) can exploit the contextual vulnerability of DP-sanitized texts.<n>Experiments uncover a double-edged sword effect of LLM reconstructions on privacy and utility.<n>We propose recommendations for using data reconstruction as a post-processing step.
arXiv Detail & Related papers (2025-08-26T12:22:45Z) - DP-Fusion: Token-Level Differentially Private Inference for Large Language Models [37.73455762168357]
Large language models (LLMs) can leak sensitive information from their context through generated outputs, either accidentally or when prompted adversarially.<n>We propose DP-Fusion, a token-level Differentially Private Inference (DPI) mechanism that provably bounds how much an LLM's outputs reveal about sensitive tokens in its context.
arXiv Detail & Related papers (2025-07-06T20:49:39Z) - Spend Your Budget Wisely: Towards an Intelligent Distribution of the Privacy Budget in Differentially Private Text Rewriting [3.0177210416625124]
We construct and evaluate a toolkit of linguistics- and NLP-based methods used to allocate a privacy budget to constituent tokens in a text document.<n>Our work highlights the intricacies of text privatization with DP, and furthermore, it calls for further work on finding more efficient ways to maximize the privatization benefits offered by DP in text rewriting.
arXiv Detail & Related papers (2025-03-28T12:33:46Z) - Investigating User Perspectives on Differentially Private Text Privatization [81.59631769859004]
This work investigates how factors of $textitscenario$, $textitdata sensitivity$, $textitmechanism type$, and $textitreason for data collection$ impact user preferences for text privatization.<n>We learn that while all these factors play a role in influencing privacy decisions, users are highly sensitive to the utility and coherence of the private output texts.
arXiv Detail & Related papers (2025-03-12T12:33:20Z) - On the Impact of Noise in Differentially Private Text Rewriting [3.0177210416625124]
We introduce a new sentence infilling privatization technique, and we use this method to explore the effect of noise in DP text rewriting.<n>We empirically demonstrate that non-DP privatization techniques excel in utility preservation and can find an acceptable empirical privacy-utility trade-off, yet cannot outperform DP methods in empirical privacy protections.
arXiv Detail & Related papers (2025-01-31T10:45:24Z) - Just Rewrite It Again: A Post-Processing Method for Enhanced Semantic Similarity and Privacy Preservation of Differentially Private Rewritten Text [3.3916160303055567]
We propose a simple post-processing method based on the goal of aligning rewritten texts with their original counterparts.
Our results show that such an approach not only produces outputs that are more semantically reminiscent of the original inputs, but also texts which score on average better in empirical privacy evaluations.
arXiv Detail & Related papers (2024-05-30T08:41:33Z) - Differentially Private Synthetic Data via Foundation Model APIs 2: Text [56.13240830670327]
A lot of high-quality text data generated in the real world is private and cannot be shared or used freely due to privacy concerns.
We propose an augmented PE algorithm, named Aug-PE, that applies to the complex setting of text.
Our results demonstrate that Aug-PE produces DP synthetic text that yields competitive utility with the SOTA DP finetuning baselines.
arXiv Detail & Related papers (2024-03-04T05:57:50Z) - DP-BART for Privatized Text Rewriting under Local Differential Privacy [2.45626162429986]
We propose a new system 'DP-BART' that largely outperforms existing LDP systems.
Our approach uses a novel clipping method, iterative pruning, and further training of internal representations which drastically reduces the amount of noise required for DP guarantees.
arXiv Detail & Related papers (2023-02-15T13:07:34Z) - Analyzing Privacy Leakage in Machine Learning via Multiple Hypothesis
Testing: A Lesson From Fano [83.5933307263932]
We study data reconstruction attacks for discrete data and analyze it under the framework of hypothesis testing.
We show that if the underlying private data takes values from a set of size $M$, then the target privacy parameter $epsilon$ can be $O(log M)$ before the adversary gains significant inferential power.
arXiv Detail & Related papers (2022-10-24T23:50:12Z) - Privacy Amplification via Shuffling for Linear Contextual Bandits [51.94904361874446]
We study the contextual linear bandit problem with differential privacy (DP)
We show that it is possible to achieve a privacy/utility trade-off between JDP and LDP by leveraging the shuffle model of privacy.
Our result shows that it is possible to obtain a tradeoff between JDP and LDP by leveraging the shuffle model while preserving local privacy.
arXiv Detail & Related papers (2021-12-11T15:23:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.