PolicyGPT: Automated Analysis of Privacy Policies with Large Language
Models
- URL: http://arxiv.org/abs/2309.10238v1
- Date: Tue, 19 Sep 2023 01:22:42 GMT
- Title: PolicyGPT: Automated Analysis of Privacy Policies with Large Language
Models
- Authors: Chenhao Tang, Zhengliang Liu, Chong Ma, Zihao Wu, Yiwei Li, Wei Liu,
Dajiang Zhu, Quanzheng Li, Xiang Li, Tianming Liu, Lei Fan
- Abstract summary: In practical use, users tend to click the Agree button directly rather than reading them carefully.
This practice exposes users to risks of privacy leakage and legal issues.
Recently, the advent of Large Language Models (LLM) such as ChatGPT and GPT-4 has opened new possibilities for text analysis.
- Score: 41.969546784168905
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Privacy policies serve as the primary conduit through which online service
providers inform users about their data collection and usage procedures.
However, in a bid to be comprehensive and mitigate legal risks, these policy
documents are often quite verbose. In practical use, users tend to click the
Agree button directly rather than reading them carefully. This practice exposes
users to risks of privacy leakage and legal issues. Recently, the advent of
Large Language Models (LLM) such as ChatGPT and GPT-4 has opened new
possibilities for text analysis, especially for lengthy documents like privacy
policies. In this study, we investigate a privacy policy text analysis
framework PolicyGPT based on the LLM. This framework was tested using two
datasets. The first dataset comprises of privacy policies from 115 websites,
which were meticulously annotated by legal experts, categorizing each segment
into one of 10 classes. The second dataset consists of privacy policies from
304 popular mobile applications, with each sentence manually annotated and
classified into one of another 10 categories. Under zero-shot learning
conditions, PolicyGPT demonstrated robust performance. For the first dataset,
it achieved an accuracy rate of 97%, while for the second dataset, it attained
an 87% accuracy rate, surpassing that of the baseline machine learning and
neural network models.
Related papers
- {A New Hope}: Contextual Privacy Policies for Mobile Applications and An
Approach Toward Automated Generation [19.578130824867596]
The aim of contextual privacy policies ( CPPs) is to fragment privacy policies into concise snippets, displaying them only within the corresponding contexts within the application's graphical user interfaces (GUIs)
In this paper, we first formulate CPP in mobile application scenario, and then present a novel multimodal framework, named SeePrivacy, specifically designed to automatically generate CPPs for mobile applications.
A human evaluation shows that 77% of the extracted privacy policy segments were perceived as well-aligned with the detected contexts.
arXiv Detail & Related papers (2024-02-22T13:32:33Z) - Tight Auditing of Differentially Private Machine Learning [77.38590306275877]
For private machine learning, existing auditing mechanisms are tight.
They only give tight estimates under implausible worst-case assumptions.
We design an improved auditing scheme that yields tight privacy estimates for natural (not adversarially crafted) datasets.
arXiv Detail & Related papers (2023-02-15T21:40:33Z) - PLUE: Language Understanding Evaluation Benchmark for Privacy Policies
in English [77.79102359580702]
We introduce the Privacy Policy Language Understanding Evaluation benchmark, a multi-task benchmark for evaluating the privacy policy language understanding.
We also collect a large corpus of privacy policies to enable privacy policy domain-specific language model pre-training.
We demonstrate that domain-specific continual pre-training offers performance improvements across all tasks.
arXiv Detail & Related papers (2022-12-20T05:58:32Z) - Retrieval Enhanced Data Augmentation for Question Answering on Privacy
Policies [74.01792675564218]
We develop a data augmentation framework based on ensembling retriever models that captures relevant text segments from unlabeled policy documents.
To improve the diversity and quality of the augmented data, we leverage multiple pre-trained language models (LMs) and cascade them with noise reduction filter models.
Using our augmented data on the PrivacyQA benchmark, we elevate the existing baseline by a large margin (10% F1) and achieve a new state-of-the-art F1 score of 50%.
arXiv Detail & Related papers (2022-04-19T15:45:23Z) - Compliance Checking with NLI: Privacy Policies vs. Regulations [0.0]
We use Natural Language Inference techniques to compare privacy regulations against sections of privacy policies from a selection of large companies.
Our model uses pre-trained embeddings, along with BiLSTM in its attention mechanism.
arXiv Detail & Related papers (2022-03-01T17:27:16Z) - Automated Detection of GDPR Disclosure Requirements in Privacy Policies
using Deep Active Learning [3.659023646021795]
Most privacy policies are verbose, full of jargon, and vaguely describe companies' data practices and users' rights.
In this paper, we create a privacy policy dataset of 1,080 websites labeled with the 18 requirements.
We develop a Convolutional Network (CNN) based model which can classify the privacy policies with an accuracy of 89.2%.
arXiv Detail & Related papers (2021-11-08T01:28:27Z) - Intent Classification and Slot Filling for Privacy Policies [34.606121042708864]
PolicyIE is a corpus consisting of 5,250 intent and 11,788 slot annotations spanning 31 privacy policies of websites and mobile applications.
We present two alternative neural approaches as baselines: (1) formulating intent classification and slot filling as a joint sequence tagging and (2) modeling them as a sequence-to-sequence learning task.
arXiv Detail & Related papers (2021-01-01T00:44:41Z) - PolicyQA: A Reading Comprehension Dataset for Privacy Policies [77.79102359580702]
We present PolicyQA, a dataset that contains 25,017 reading comprehension style examples curated from an existing corpus of 115 website privacy policies.
We evaluate two existing neural QA models and perform rigorous analysis to reveal the advantages and challenges offered by PolicyQA.
arXiv Detail & Related papers (2020-10-06T09:04:58Z) - PGLP: Customizable and Rigorous Location Privacy through Policy Graph [68.3736286350014]
We propose a new location privacy notion called PGLP, which provides a rich interface to release private locations with customizable and rigorous privacy guarantee.
Specifically, we formalize a user's location privacy requirements using a textitlocation policy graph, which is expressive and customizable.
Third, we design a private location trace release framework that pipelines the detection of location exposure, policy graph repair, and private trajectory release with customizable and rigorous location privacy.
arXiv Detail & Related papers (2020-05-04T04:25:59Z) - A Comparative Study of Sequence Classification Models for Privacy Policy
Coverage Analysis [0.0]
Privacy policies are legal documents that describe how a website will collect, use, and distribute a user's data.
Our solution is to provide users with a coverage analysis of a given website's privacy policy using a wide range of classical machine learning and deep learning techniques.
arXiv Detail & Related papers (2020-02-12T21:46:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.