CAPE: Context-Aware Private Embeddings for Private Language Learning
- URL: http://arxiv.org/abs/2108.12318v1
- Date: Fri, 27 Aug 2021 14:50:12 GMT
- Title: CAPE: Context-Aware Private Embeddings for Private Language Learning
- Authors: Richard Plant, Dimitra Gkatzia, Valerio Giuffrida
- Abstract summary: Context-Aware Private Embeddings (CAPE) is a novel approach which preserves privacy during training of embeddings.
CAPE applies calibrated noise through differential privacy, preserving the encoded semantic links while obscuring sensitive information.
Experimental results demonstrate that the proposed approach reduces private information leakage better than either single intervention.
- Score: 0.5156484100374058
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deep learning-based language models have achieved state-of-the-art results in
a number of applications including sentiment analysis, topic labelling, intent
classification and others. Obtaining text representations or embeddings using
these models presents the possibility of encoding personally identifiable
information learned from language and context cues that may present a risk to
reputation or privacy. To ameliorate these issues, we propose Context-Aware
Private Embeddings (CAPE), a novel approach which preserves privacy during
training of embeddings. To maintain the privacy of text representations, CAPE
applies calibrated noise through differential privacy, preserving the encoded
semantic links while obscuring sensitive information. In addition, CAPE employs
an adversarial training regime that obscures identified private variables.
Experimental results demonstrate that the proposed approach reduces private
information leakage better than either single intervention.
Related papers
- Masked Differential Privacy [64.32494202656801]
We propose an effective approach called masked differential privacy (DP), which allows for controlling sensitive regions where differential privacy is applied.
Our method operates selectively on data and allows for defining non-sensitive-temporal regions without DP application or combining differential privacy with other privacy techniques within data samples.
arXiv Detail & Related papers (2024-10-22T15:22:53Z) - NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human [55.20137833039499]
We suggest sanitizing sensitive text using two common strategies used by humans.
We curate the first corpus, coined NAP2, through both crowdsourcing and the use of large language models.
arXiv Detail & Related papers (2024-06-06T05:07:44Z) - Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory [82.7042006247124]
We show that even the most capable AI models reveal private information in contexts that humans would not, 39% and 57% of the time, respectively.
Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
arXiv Detail & Related papers (2023-10-27T04:15:30Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Context-Aware Differential Privacy for Language Modeling [41.54238543400462]
This paper introduces Context-Aware Differentially Private Language Model (CADP-LM)
CADP-LM relies on the notion of emphcontext to define and audit the potentially sensitive information.
A unique characteristic of CADP-LM is its ability to target the protection of sensitive sentences and contexts only.
arXiv Detail & Related papers (2023-01-28T20:06:16Z) - PLUE: Language Understanding Evaluation Benchmark for Privacy Policies
in English [77.79102359580702]
We introduce the Privacy Policy Language Understanding Evaluation benchmark, a multi-task benchmark for evaluating the privacy policy language understanding.
We also collect a large corpus of privacy policies to enable privacy policy domain-specific language model pre-training.
We demonstrate that domain-specific continual pre-training offers performance improvements across all tasks.
arXiv Detail & Related papers (2022-12-20T05:58:32Z) - How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss.
We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z) - Interpretable Privacy Preservation of Text Representations Using Vector
Steganography [0.0]
Contextual word representations generated by language models (LMs) learn spurious associations present in the training corpora.
adversaries can exploit these associations to reverse-engineer the private attributes of entities mentioned within the corpora.
I aim to study and develop methods to incorporate steganographic modifications within the vector geometry to obfuscate underlying spurious associations.
arXiv Detail & Related papers (2021-12-05T12:42:40Z) - Selective Differential Privacy for Language Modeling [36.64464956102432]
Previous work has attempted to tackle this challenge by training RNN-based language models with differential privacy guarantees.
We propose a new privacy notion, selective differential privacy, to provide rigorous privacy guarantees on the sensitive portion of the data.
Experiments on both language modeling and dialog system building show that the proposed privacy-preserving mechanism achieves better utilities.
arXiv Detail & Related papers (2021-08-30T01:11:10Z) - Privacy-Adaptive BERT for Natural Language Understanding [20.821155542969947]
We study how to improve the effectiveness of NLU models under a Local Privacy setting using BERT.
We propose privacy-adaptive LM pretraining methods and demonstrate that they can significantly improve model performance on privatized text input.
arXiv Detail & Related papers (2021-04-15T15:01:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.