PolicyGPT: Automated Analysis of Privacy Policies with Large Language
Models
- URL: http://arxiv.org/abs/2309.10238v1
- Date: Tue, 19 Sep 2023 01:22:42 GMT
- Title: PolicyGPT: Automated Analysis of Privacy Policies with Large Language
Models
- Authors: Chenhao Tang, Zhengliang Liu, Chong Ma, Zihao Wu, Yiwei Li, Wei Liu,
Dajiang Zhu, Quanzheng Li, Xiang Li, Tianming Liu, Lei Fan
- Abstract summary: In practical use, users tend to click the Agree button directly rather than reading them carefully.
This practice exposes users to risks of privacy leakage and legal issues.
Recently, the advent of Large Language Models (LLM) such as ChatGPT and GPT-4 has opened new possibilities for text analysis.
- Score: 41.969546784168905
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Privacy policies serve as the primary conduit through which online service
providers inform users about their data collection and usage procedures.
However, in a bid to be comprehensive and mitigate legal risks, these policy
documents are often quite verbose. In practical use, users tend to click the
Agree button directly rather than reading them carefully. This practice exposes
users to risks of privacy leakage and legal issues. Recently, the advent of
Large Language Models (LLM) such as ChatGPT and GPT-4 has opened new
possibilities for text analysis, especially for lengthy documents like privacy
policies. In this study, we investigate a privacy policy text analysis
framework PolicyGPT based on the LLM. This framework was tested using two
datasets. The first dataset comprises of privacy policies from 115 websites,
which were meticulously annotated by legal experts, categorizing each segment
into one of 10 classes. The second dataset consists of privacy policies from
304 popular mobile applications, with each sentence manually annotated and
classified into one of another 10 categories. Under zero-shot learning
conditions, PolicyGPT demonstrated robust performance. For the first dataset,
it achieved an accuracy rate of 97%, while for the second dataset, it attained
an 87% accuracy rate, surpassing that of the baseline machine learning and
neural network models.
Related papers
- Entailment-Driven Privacy Policy Classification with LLMs [3.564208334473993]
We propose a framework to classify paragraphs of privacy policies into meaningful labels that are easily understood by users.
Our framework improves the F1 score in average by 11.2%.
arXiv Detail & Related papers (2024-09-25T05:07:05Z) - PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action [54.11479432110771]
PrivacyLens is a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories.
We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds.
State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions.
arXiv Detail & Related papers (2024-08-29T17:58:38Z) - Are LLM-based methods good enough for detecting unfair terms of service? [67.49487557224415]
Large language models (LLMs) are good at parsing long text-based documents.
We build a dataset consisting of 12 questions applied individually to a set of privacy policies.
Some open-source models are able to provide a higher accuracy compared to some commercial models.
arXiv Detail & Related papers (2024-08-24T09:26:59Z) - {A New Hope}: Contextual Privacy Policies for Mobile Applications and An
Approach Toward Automated Generation [19.578130824867596]
The aim of contextual privacy policies ( CPPs) is to fragment privacy policies into concise snippets, displaying them only within the corresponding contexts within the application's graphical user interfaces (GUIs)
In this paper, we first formulate CPP in mobile application scenario, and then present a novel multimodal framework, named SeePrivacy, specifically designed to automatically generate CPPs for mobile applications.
A human evaluation shows that 77% of the extracted privacy policy segments were perceived as well-aligned with the detected contexts.
arXiv Detail & Related papers (2024-02-22T13:32:33Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Retrieval Enhanced Data Augmentation for Question Answering on Privacy
Policies [74.01792675564218]
We develop a data augmentation framework based on ensembling retriever models that captures relevant text segments from unlabeled policy documents.
To improve the diversity and quality of the augmented data, we leverage multiple pre-trained language models (LMs) and cascade them with noise reduction filter models.
Using our augmented data on the PrivacyQA benchmark, we elevate the existing baseline by a large margin (10% F1) and achieve a new state-of-the-art F1 score of 50%.
arXiv Detail & Related papers (2022-04-19T15:45:23Z) - Compliance Checking with NLI: Privacy Policies vs. Regulations [0.0]
We use Natural Language Inference techniques to compare privacy regulations against sections of privacy policies from a selection of large companies.
Our model uses pre-trained embeddings, along with BiLSTM in its attention mechanism.
arXiv Detail & Related papers (2022-03-01T17:27:16Z) - Automated Detection of GDPR Disclosure Requirements in Privacy Policies
using Deep Active Learning [3.659023646021795]
Most privacy policies are verbose, full of jargon, and vaguely describe companies' data practices and users' rights.
In this paper, we create a privacy policy dataset of 1,080 websites labeled with the 18 requirements.
We develop a Convolutional Network (CNN) based model which can classify the privacy policies with an accuracy of 89.2%.
arXiv Detail & Related papers (2021-11-08T01:28:27Z) - PolicyQA: A Reading Comprehension Dataset for Privacy Policies [77.79102359580702]
We present PolicyQA, a dataset that contains 25,017 reading comprehension style examples curated from an existing corpus of 115 website privacy policies.
We evaluate two existing neural QA models and perform rigorous analysis to reveal the advantages and challenges offered by PolicyQA.
arXiv Detail & Related papers (2020-10-06T09:04:58Z) - A Comparative Study of Sequence Classification Models for Privacy Policy
Coverage Analysis [0.0]
Privacy policies are legal documents that describe how a website will collect, use, and distribute a user's data.
Our solution is to provide users with a coverage analysis of a given website's privacy policy using a wide range of classical machine learning and deep learning techniques.
arXiv Detail & Related papers (2020-02-12T21:46:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.