Compliance Checking with NLI: Privacy Policies vs. Regulations
- URL: http://arxiv.org/abs/2204.01845v1
- Date: Tue, 1 Mar 2022 17:27:16 GMT
- Title: Compliance Checking with NLI: Privacy Policies vs. Regulations
- Authors: Amin Rabinia and Zane Nygaard
- Abstract summary: We use Natural Language Inference techniques to compare privacy regulations against sections of privacy policies from a selection of large companies.
Our model uses pre-trained embeddings, along with BiLSTM in its attention mechanism.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A privacy policy is a document that states how a company intends to handle
and manage their customers' personal data. One of the problems that arises with
these privacy policies is that their content might violate data privacy
regulations. Because of the enormous number of privacy policies that exist, the
only realistic way to check for legal inconsistencies in all of them is through
an automated method. In this work, we use Natural Language Inference (NLI)
techniques to compare privacy regulations against sections of privacy policies
from a selection of large companies. Our NLI model uses pre-trained embeddings,
along with BiLSTM in its attention mechanism. We tried two versions of our
model: one that was trained on the Stanford Natural Language Inference (SNLI)
and the second on the Multi-Genre Natural Language Inference (MNLI) dataset. We
found that our test accuracy was higher on our model trained on the SNLI, but
when actually doing NLI tasks on real world privacy policies, the model trained
on MNLI generalized and performed much better.
Related papers
- PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action [54.11479432110771]
PrivacyLens is a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories.
We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds.
State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions.
arXiv Detail & Related papers (2024-08-29T17:58:38Z) - Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit.
We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z) - The Privacy Policy Permission Model: A Unified View of Privacy Policies [0.5371337604556311]
A privacy policy is a set of statements that specifies how an organization gathers, uses, discloses, and maintains a client's data.
Most privacy policies lack a clear, complete explanation of how data providers' information is used.
We propose a modeling methodology, called the Privacy Policy Permission Model (PPPM), that provides a uniform, easy-to-understand representation of privacy policies.
arXiv Detail & Related papers (2024-03-26T06:12:38Z) - DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer [57.04801796205638]
Large Language Models (LLMs) have emerged as dominant tools for various tasks.
However, concerns surrounding data privacy present obstacles due to the tuned prompts' dependency on sensitive private information.
We present Differentially-Private Offsite Prompt Tuning (DP-OPT) to address this challenge.
arXiv Detail & Related papers (2023-11-27T02:01:10Z) - PrivLM-Bench: A Multi-level Privacy Evaluation Benchmark for Language Models [42.20437015301152]
We present PrivLM-Bench, a benchmark for evaluating the privacy leakage of language models (LMs)
Instead of only reporting DP parameters, PrivLM-Bench sheds light on the neglected inference data privacy during actual usage.
We conduct extensive experiments on three datasets of GLUE for mainstream LMs.
arXiv Detail & Related papers (2023-11-07T14:55:52Z) - Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory [82.7042006247124]
We show that even the most capable AI models reveal private information in contexts that humans would not, 39% and 57% of the time, respectively.
Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
arXiv Detail & Related papers (2023-10-27T04:15:30Z) - PolicyGPT: Automated Analysis of Privacy Policies with Large Language
Models [41.969546784168905]
In practical use, users tend to click the Agree button directly rather than reading them carefully.
This practice exposes users to risks of privacy leakage and legal issues.
Recently, the advent of Large Language Models (LLM) such as ChatGPT and GPT-4 has opened new possibilities for text analysis.
arXiv Detail & Related papers (2023-09-19T01:22:42Z) - PLUE: Language Understanding Evaluation Benchmark for Privacy Policies
in English [77.79102359580702]
We introduce the Privacy Policy Language Understanding Evaluation benchmark, a multi-task benchmark for evaluating the privacy policy language understanding.
We also collect a large corpus of privacy policies to enable privacy policy domain-specific language model pre-training.
We demonstrate that domain-specific continual pre-training offers performance improvements across all tasks.
arXiv Detail & Related papers (2022-12-20T05:58:32Z) - Just Fine-tune Twice: Selective Differential Privacy for Large Language
Models [69.66654761324702]
We propose a simple yet effective just-fine-tune-twice privacy mechanism to achieve SDP for large Transformer-based language models.
Experiments show that our models achieve strong performance while staying robust to the canary insertion attack.
arXiv Detail & Related papers (2022-04-15T22:36:55Z) - Privacy-Adaptive BERT for Natural Language Understanding [20.821155542969947]
We study how to improve the effectiveness of NLU models under a Local Privacy setting using BERT.
We propose privacy-adaptive LM pretraining methods and demonstrate that they can significantly improve model performance on privatized text input.
arXiv Detail & Related papers (2021-04-15T15:01:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.