Guiding Text-to-Text Privatization by Syntax
- URL: http://arxiv.org/abs/2306.01471v1
- Date: Fri, 2 Jun 2023 11:52:21 GMT
- Title: Guiding Text-to-Text Privatization by Syntax
- Authors: Stefan Arnold, Dilara Yesilbas, Sven Weinzierl
- Abstract summary: Metric Differential Privacy is a generalization of differential privacy tailored to address the unique challenges of text-to-text privatization.
We analyze the capability of text-to-text privatization to preserve the grammatical category of words after substitution.
We transform the privatization step into a candidate selection problem in which substitutions are directed to words with matching grammatical properties.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Metric Differential Privacy is a generalization of differential privacy
tailored to address the unique challenges of text-to-text privatization. By
adding noise to the representation of words in the geometric space of
embeddings, words are replaced with words located in the proximity of the noisy
representation. Since embeddings are trained based on word co-occurrences, this
mechanism ensures that substitutions stem from a common semantic context.
Without considering the grammatical category of words, however, this mechanism
cannot guarantee that substitutions play similar syntactic roles. We analyze
the capability of text-to-text privatization to preserve the grammatical
category of words after substitution and find that surrogate texts consist
almost exclusively of nouns. Lacking the capability to produce surrogate texts
that correlate with the structure of the sensitive texts, we encompass our
analysis by transforming the privatization step into a candidate selection
problem in which substitutions are directed to words with matching grammatical
properties. We demonstrate a substantial improvement in the performance of
downstream tasks by up to $4.66\%$ while retaining comparative privacy
guarantees.
Related papers
- A Collocation-based Method for Addressing Challenges in Word-level Metric Differential Privacy [3.0177210416625124]
Several word-level $textitMetric$ Differential Privacy approaches have been proposed.
We devise a method where composed privatized outputs have higher semantic coherence and variable length.
We evaluate our method in utility and privacy tests, which make a clear case for tokenization strategies beyond the word level.
arXiv Detail & Related papers (2024-06-30T09:37:34Z) - RealCustom: Narrowing Real Text Word for Real-Time Open-Domain
Text-to-Image Customization [57.86083349873154]
Text-to-image customization aims to synthesize text-driven images for the given subjects.
Existing works follow the pseudo-word paradigm, i.e., represent the given subjects as pseudo-words and then compose them with the given text.
We present RealCustom that, for the first time, disentangles similarity from controllability by precisely limiting subject influence to relevant parts only.
arXiv Detail & Related papers (2024-03-01T12:12:09Z) - Quantifying the redundancy between prosody and text [67.07817268372743]
We use large language models to estimate how much information is redundant between prosody and the words themselves.
We find a high degree of redundancy between the information carried by the words and prosodic information across several prosodic features.
Still, we observe that prosodic features can not be fully predicted from text, suggesting that prosody carries information above and beyond the words.
arXiv Detail & Related papers (2023-11-28T21:15:24Z) - Disentangling the Linguistic Competence of Privacy-Preserving BERT [0.0]
Differential Privacy (DP) has been tailored to address the unique challenges of text-to-text privatization.
We employ a series of interpretation techniques on the internal representations extracted from BERT trained on perturbed pre-text.
Using probing tasks to unpack this dissimilarity, we find evidence that text-to-text privatization affects the linguistic competence across several formalisms.
arXiv Detail & Related papers (2023-10-17T16:00:26Z) - Driving Context into Text-to-Text Privatization [0.0]
textitMetric Differential Privacy enables text-to-text privatization by adding noise to the vector of a word.
We demonstrate a substantial increase in classification accuracy by $6.05%$.
arXiv Detail & Related papers (2023-06-02T11:33:06Z) - Textual Entailment Recognition with Semantic Features from Empirical
Text Representation [60.31047947815282]
A text entails a hypothesis if and only if the true value of the hypothesis follows the text.
In this paper, we propose a novel approach to identifying the textual entailment relationship between text and hypothesis.
We employ an element-wise Manhattan distance vector-based feature that can identify the semantic entailment relationship between the text-hypothesis pair.
arXiv Detail & Related papers (2022-10-18T10:03:51Z) - Using Paraphrases to Study Properties of Contextual Embeddings [46.84861591608146]
We use paraphrases as a unique source of data to analyze contextualized embeddings.
Because paraphrases naturally encode consistent word and phrase semantics, they provide a unique lens for investigating properties of embeddings.
We find that contextual embeddings effectively handle polysemous words, but give synonyms surprisingly different representations in many cases.
arXiv Detail & Related papers (2022-07-12T14:22:05Z) - LexSubCon: Integrating Knowledge from Lexical Resources into Contextual
Embeddings for Lexical Substitution [76.615287796753]
We introduce LexSubCon, an end-to-end lexical substitution framework based on contextual embedding models.
This is achieved by combining contextual information with knowledge from structured lexical resources.
Our experiments show that LexSubCon outperforms previous state-of-the-art methods on LS07 and CoInCo benchmark datasets.
arXiv Detail & Related papers (2021-07-11T21:25:56Z) - MuSeM: Detecting Incongruent News Headlines using Mutual Attentive
Semantic Matching [7.608480381965392]
Measuring the congruence between two texts has several useful applications, such as detecting deceptive and misleading news headlines on the web.
This paper proposes a method that uses inter-mutual attention-based semantic matching between the original and synthetically generated headlines.
We observe that the proposed method outperforms prior arts significantly for two publicly available datasets.
arXiv Detail & Related papers (2020-10-07T19:19:42Z) - Privacy Guarantees for De-identifying Text Transformations [17.636430224292866]
We derive formal privacy guarantees for text transformation-based de-identification methods on the basis of Differential Privacy.
We compare a simple redact approach with more sophisticated word-by-word replacement using deep learning models on multiple natural language understanding tasks.
We find that only word-by-word replacement is robust against performance drops in various tasks.
arXiv Detail & Related papers (2020-08-07T12:06:42Z) - Neural Syntactic Preordering for Controlled Paraphrase Generation [57.5316011554622]
Our work uses syntactic transformations to softly "reorder'' the source sentence and guide our neural paraphrasing model.
First, given an input sentence, we derive a set of feasible syntactic rearrangements using an encoder-decoder model.
Next, we use each proposed rearrangement to produce a sequence of position embeddings, which encourages our final encoder-decoder paraphrase model to attend to the source words in a particular order.
arXiv Detail & Related papers (2020-05-05T09:02:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.