PLUE: Language Understanding Evaluation Benchmark for Privacy Policies
in English
- URL: http://arxiv.org/abs/2212.10011v2
- Date: Fri, 12 May 2023 07:38:29 GMT
- Title: PLUE: Language Understanding Evaluation Benchmark for Privacy Policies
in English
- Authors: Jianfeng Chi, Wasi Uddin Ahmad, Yuan Tian, Kai-Wei Chang
- Abstract summary: We introduce the Privacy Policy Language Understanding Evaluation benchmark, a multi-task benchmark for evaluating the privacy policy language understanding.
We also collect a large corpus of privacy policies to enable privacy policy domain-specific language model pre-training.
We demonstrate that domain-specific continual pre-training offers performance improvements across all tasks.
- Score: 77.79102359580702
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Privacy policies provide individuals with information about their rights and
how their personal information is handled. Natural language understanding (NLU)
technologies can support individuals and practitioners to understand better
privacy practices described in lengthy and complex documents. However, existing
efforts that use NLU technologies are limited by processing the language in a
way exclusive to a single task focusing on certain privacy practices. To this
end, we introduce the Privacy Policy Language Understanding Evaluation (PLUE)
benchmark, a multi-task benchmark for evaluating the privacy policy language
understanding across various tasks. We also collect a large corpus of privacy
policies to enable privacy policy domain-specific language model pre-training.
We evaluate several generic pre-trained language models and continue
pre-training them on the collected corpus. We demonstrate that domain-specific
continual pre-training offers performance improvements across all tasks.
Related papers
- Collection, usage and privacy of mobility data in the enterprise and public administrations [55.2480439325792]
Security measures such as anonymization are needed to protect individuals' privacy.
Within our study, we conducted expert interviews to gain insights into practices in the field.
We survey privacy-enhancing methods in use, which generally do not comply with state-of-the-art standards of differential privacy.
arXiv Detail & Related papers (2024-07-04T08:29:27Z) - Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit.
We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z) - Privacy-Preserving Language Model Inference with Instance Obfuscation [33.86459812694288]
Language Models as a Service (LM) offers convenient access for developers and researchers to perform inference using pre-trained language models.
The input data and the inference results containing private information are exposed as plaintext during the service call, leading to privacy issues.
We propose Instance-Obfuscated Inference (IOI) method, which focuses on addressing the decision privacy issue of natural language understanding tasks.
arXiv Detail & Related papers (2024-02-13T05:36:54Z) - Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory [82.7042006247124]
We show that even the most capable AI models reveal private information in contexts that humans would not, 39% and 57% of the time, respectively.
Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
arXiv Detail & Related papers (2023-10-27T04:15:30Z) - Large Language Models Can Be Good Privacy Protection Learners [53.07930843882592]
We introduce Privacy Protection Language Models (PPLM), a novel paradigm for fine-tuning language models.
Our work offers a theoretical analysis for model design and delves into various techniques such as corpus curation, penalty-based unlikelihood in training loss, and instruction-based tuning.
In particular, instruction tuning with both positive and negative examples, stands out as a promising method, effectively protecting private data while enhancing the model's knowledge.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Building cross-language corpora for human understanding of privacy
policies [7.1707060082291925]
This work provides a methodology for building comparable cross-language in a national language and a reference study language.
We provide an application example of our methodology comparing English and Italian extending the corpus of one of the first studies about users understanding of technical terms in privacy policies.
arXiv Detail & Related papers (2023-02-10T16:16:55Z) - How to keep text private? A systematic review of deep learning methods
for privacy-preserving natural language processing [0.38073142980732994]
Article systematically reviews over sixty methods for privacy-preserving NLP published between 2016 and 2020.
We introduce a novel taxonomy for classifying the existing methods into three categories: methods trusted methods verification methods.
We discuss open challenges in privacy-preserving NLP regarding data traceability, overhead dataset size and the prevalence of human biases in embeddings.
arXiv Detail & Related papers (2022-05-20T11:29:44Z) - Privacy-Adaptive BERT for Natural Language Understanding [20.821155542969947]
We study how to improve the effectiveness of NLU models under a Local Privacy setting using BERT.
We propose privacy-adaptive LM pretraining methods and demonstrate that they can significantly improve model performance on privatized text input.
arXiv Detail & Related papers (2021-04-15T15:01:28Z) - Private Reinforcement Learning with PAC and Regret Guarantees [69.4202374491817]
We design privacy preserving exploration policies for episodic reinforcement learning (RL)
We first provide a meaningful privacy formulation using the notion of joint differential privacy (JDP)
We then develop a private optimism-based learning algorithm that simultaneously achieves strong PAC and regret bounds, and enjoys a JDP guarantee.
arXiv Detail & Related papers (2020-09-18T20:18:35Z) - Privacy at Scale: Introducing the PrivaSeer Corpus of Web Privacy Policies [13.09699710197036]
We create PrivaSeer, a corpus of over one million English language website privacy policies.
We show results from readability tests, document similarity, keyphrase extraction, and explored the corpus through topic modeling.
arXiv Detail & Related papers (2020-04-23T13:21:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.