A Comparative Study of Sequence Classification Models for Privacy Policy
Coverage Analysis
- URL: http://arxiv.org/abs/2003.04972v1
- Date: Wed, 12 Feb 2020 21:46:22 GMT
- Title: A Comparative Study of Sequence Classification Models for Privacy Policy
Coverage Analysis
- Authors: Zachary Lindner
- Abstract summary: Privacy policies are legal documents that describe how a website will collect, use, and distribute a user's data.
Our solution is to provide users with a coverage analysis of a given website's privacy policy using a wide range of classical machine learning and deep learning techniques.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Privacy policies are legal documents that describe how a website will
collect, use, and distribute a user's data. Unfortunately, such documents are
often overly complicated and filled with legal jargon; making it difficult for
users to fully grasp what exactly is being collected and why. Our solution to
this problem is to provide users with a coverage analysis of a given website's
privacy policy using a wide range of classical machine learning and deep
learning techniques. Given a website's privacy policy, the classifier
identifies the associated data practice for each logical segment. These data
practices/labels are taken directly from the OPP-115 corpus. For example, the
data practice "Data Retention" refers to how long a website stores a user's
information. The coverage analysis allows users to determine how many of the
ten possible data practices are covered, along with identifying the sections
that correspond to the data practices of particular interest.
Related papers
- Entailment-Driven Privacy Policy Classification with LLMs [3.564208334473993]
We propose a framework to classify paragraphs of privacy policies into meaningful labels that are easily understood by users.
Our framework improves the F1 score in average by 11.2%.
arXiv Detail & Related papers (2024-09-25T05:07:05Z) - Towards Split Learning-based Privacy-Preserving Record Linkage [49.1574468325115]
Split Learning has been introduced to facilitate applications where user data privacy is a requirement.
In this paper, we investigate the potentials of Split Learning for Privacy-Preserving Record Matching.
arXiv Detail & Related papers (2024-09-02T09:17:05Z) - Understanding Privacy Norms through Web Forms [5.972457400484541]
We build a specialized crawler to discover web forms on 11,500 popular websites.
We run it on 11,500 popular websites, and we create a dataset of 293K web forms.
By analyzing the annotated dataset, we reveal common patterns of data collection practices.
arXiv Detail & Related papers (2024-08-29T07:11:09Z) - Collection, usage and privacy of mobility data in the enterprise and public administrations [55.2480439325792]
Security measures such as anonymization are needed to protect individuals' privacy.
Within our study, we conducted expert interviews to gain insights into practices in the field.
We survey privacy-enhancing methods in use, which generally do not comply with state-of-the-art standards of differential privacy.
arXiv Detail & Related papers (2024-07-04T08:29:27Z) - Automated Detection and Analysis of Data Practices Using A Real-World
Corpus [20.4572759138767]
We propose an automated approach to identify and visualize data practices within privacy policies at different levels of detail.
Our approach accurately matches data practice descriptions with policy excerpts, facilitating the presentation of simplified privacy information to users.
arXiv Detail & Related papers (2024-02-16T18:51:40Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - PolicyGPT: Automated Analysis of Privacy Policies with Large Language
Models [41.969546784168905]
In practical use, users tend to click the Agree button directly rather than reading them carefully.
This practice exposes users to risks of privacy leakage and legal issues.
Recently, the advent of Large Language Models (LLM) such as ChatGPT and GPT-4 has opened new possibilities for text analysis.
arXiv Detail & Related papers (2023-09-19T01:22:42Z) - Protecting User Privacy in Online Settings via Supervised Learning [69.38374877559423]
We design an intelligent approach to online privacy protection that leverages supervised learning.
By detecting and blocking data collection that might infringe on a user's privacy, we can restore a degree of digital privacy to the user.
arXiv Detail & Related papers (2023-04-06T05:20:16Z) - Intent Classification and Slot Filling for Privacy Policies [34.606121042708864]
PolicyIE is a corpus consisting of 5,250 intent and 11,788 slot annotations spanning 31 privacy policies of websites and mobile applications.
We present two alternative neural approaches as baselines: (1) formulating intent classification and slot filling as a joint sequence tagging and (2) modeling them as a sequence-to-sequence learning task.
arXiv Detail & Related papers (2021-01-01T00:44:41Z) - PolicyQA: A Reading Comprehension Dataset for Privacy Policies [77.79102359580702]
We present PolicyQA, a dataset that contains 25,017 reading comprehension style examples curated from an existing corpus of 115 website privacy policies.
We evaluate two existing neural QA models and perform rigorous analysis to reveal the advantages and challenges offered by PolicyQA.
arXiv Detail & Related papers (2020-10-06T09:04:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.