Disentangling the Linguistic Competence of Privacy-Preserving BERT
- URL: http://arxiv.org/abs/2310.11363v1
- Date: Tue, 17 Oct 2023 16:00:26 GMT
- Title: Disentangling the Linguistic Competence of Privacy-Preserving BERT
- Authors: Stefan Arnold, Nils Kemmerzell, and Annika Schreiner
- Abstract summary: Differential Privacy (DP) has been tailored to address the unique challenges of text-to-text privatization.
We employ a series of interpretation techniques on the internal representations extracted from BERT trained on perturbed pre-text.
Using probing tasks to unpack this dissimilarity, we find evidence that text-to-text privatization affects the linguistic competence across several formalisms.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Differential Privacy (DP) has been tailored to address the unique challenges
of text-to-text privatization. However, text-to-text privatization is known for
degrading the performance of language models when trained on perturbed text.
Employing a series of interpretation techniques on the internal representations
extracted from BERT trained on perturbed pre-text, we intend to disentangle at
the linguistic level the distortion induced by differential privacy.
Experimental results from a representational similarity analysis indicate that
the overall similarity of internal representations is substantially reduced.
Using probing tasks to unpack this dissimilarity, we find evidence that
text-to-text privatization affects the linguistic competence across several
formalisms, encoding localized properties of words while falling short at
encoding the contextual relationships between spans of words.
Related papers
- Pixel Sentence Representation Learning [67.4775296225521]
In this work, we conceptualize the learning of sentence-level textual semantics as a visual representation learning process.
We employ visually-grounded text perturbation methods like typos and word order shuffling, resonating with human cognitive patterns, and enabling perturbation to be perceived as continuous.
Our approach is further bolstered by large-scale unsupervised topical alignment training and natural language inference supervision.
arXiv Detail & Related papers (2024-02-13T02:46:45Z) - Soft Alignment of Modality Space for End-to-end Speech Translation [49.29045524083467]
End-to-end Speech Translation aims to convert speech into target text within a unified model.
The inherent differences between speech and text modalities often impede effective cross-modal and cross-lingual transfer.
We introduce Soft Alignment (S-Align), using adversarial training to align the representation spaces of both modalities.
arXiv Detail & Related papers (2023-12-18T06:08:51Z) - Guiding Text-to-Text Privatization by Syntax [0.0]
Metric Differential Privacy is a generalization of differential privacy tailored to address the unique challenges of text-to-text privatization.
We analyze the capability of text-to-text privatization to preserve the grammatical category of words after substitution.
We transform the privatization step into a candidate selection problem in which substitutions are directed to words with matching grammatical properties.
arXiv Detail & Related papers (2023-06-02T11:52:21Z) - Natural Language Decompositions of Implicit Content Enable Better Text
Representations [56.85319224208865]
We introduce a method for the analysis of text that takes implicitly communicated content explicitly into account.
We use a large language model to produce sets of propositions that are inferentially related to the text that has been observed.
Our results suggest that modeling the meanings behind observed language, rather than the literal text alone, is a valuable direction for NLP.
arXiv Detail & Related papers (2023-05-23T23:45:20Z) - DP-BART for Privatized Text Rewriting under Local Differential Privacy [2.45626162429986]
We propose a new system 'DP-BART' that largely outperforms existing LDP systems.
Our approach uses a novel clipping method, iterative pruning, and further training of internal representations which drastically reduces the amount of noise required for DP guarantees.
arXiv Detail & Related papers (2023-02-15T13:07:34Z) - The Limits of Word Level Differential Privacy [30.34805746574316]
We propose a new method for text anonymization based on transformer based language models fine-tuned for paraphrasing.
We evaluate the performance of our method via thorough experimentation and demonstrate superior performance over the discussed mechanisms.
arXiv Detail & Related papers (2022-05-02T21:53:10Z) - Semantics-Preserved Distortion for Personal Privacy Protection in Information Management [65.08939490413037]
This paper suggests a linguistically-grounded approach to distort texts while maintaining semantic integrity.
We present two distinct frameworks for semantic-preserving distortion: a generative approach and a substitutive approach.
We also explore privacy protection in a specific medical information management scenario, showing our method effectively limits sensitive data memorization.
arXiv Detail & Related papers (2022-01-04T04:01:05Z) - Interpretable Privacy Preservation of Text Representations Using Vector
Steganography [0.0]
Contextual word representations generated by language models (LMs) learn spurious associations present in the training corpora.
adversaries can exploit these associations to reverse-engineer the private attributes of entities mentioned within the corpora.
I aim to study and develop methods to incorporate steganographic modifications within the vector geometry to obfuscate underlying spurious associations.
arXiv Detail & Related papers (2021-12-05T12:42:40Z) - Contextualized Semantic Distance between Highly Overlapped Texts [85.1541170468617]
Overlapping frequently occurs in paired texts in natural language processing tasks like text editing and semantic similarity evaluation.
This paper aims to address the issue with a mask-and-predict strategy.
We take the words in the longest common sequence as neighboring words and use masked language modeling (MLM) to predict the distributions on their positions.
Experiments on Semantic Textual Similarity show NDD to be more sensitive to various semantic differences, especially on highly overlapped paired texts.
arXiv Detail & Related papers (2021-10-04T03:59:15Z) - CAPE: Context-Aware Private Embeddings for Private Language Learning [0.5156484100374058]
Context-Aware Private Embeddings (CAPE) is a novel approach which preserves privacy during training of embeddings.
CAPE applies calibrated noise through differential privacy, preserving the encoded semantic links while obscuring sensitive information.
Experimental results demonstrate that the proposed approach reduces private information leakage better than either single intervention.
arXiv Detail & Related papers (2021-08-27T14:50:12Z) - Intrinsic Probing through Dimension Selection [69.52439198455438]
Most modern NLP systems make use of pre-trained contextual representations that attain astonishingly high performance on a variety of tasks.
Such high performance should not be possible unless some form of linguistic structure inheres in these representations, and a wealth of research has sprung up on probing for it.
In this paper, we draw a distinction between intrinsic probing, which examines how linguistic information is structured within a representation, and the extrinsic probing popular in prior work, which only argues for the presence of such information by showing that it can be successfully extracted.
arXiv Detail & Related papers (2020-10-06T15:21:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.