AI-enabled Automation for Completeness Checking of Privacy Policies
- URL: http://arxiv.org/abs/2106.05688v1
- Date: Thu, 10 Jun 2021 12:10:51 GMT
- Title: AI-enabled Automation for Completeness Checking of Privacy Policies
- Authors: Orlando Amaral, Sallam Abualhaija, Damiano Torre, Mehrdad Sabetzadeh,
Lionel C. Briand
- Abstract summary: In Europe, privacy policies are subject to compliance with the General Data Protection Regulation.
In this paper, we propose AI-based automation for completeness checking privacy policies.
- Score: 7.707284039078785
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Technological advances in information sharing have raised concerns about data
protection. Privacy policies contain privacy-related requirements about how the
personal data of individuals will be handled by an organization or a software
system (e.g., a web service or an app). In Europe, privacy policies are subject
to compliance with the General Data Protection Regulation (GDPR). A
prerequisite for GDPR compliance checking is to verify whether the content of a
privacy policy is complete according to the provisions of GDPR. Incomplete
privacy policies might result in large fines on violating organization as well
as incomplete privacy-related software specifications. Manual completeness
checking is both time-consuming and error-prone. In this paper, we propose
AI-based automation for the completeness checking of privacy policies. Through
systematic qualitative methods, we first build two artifacts to characterize
the privacy-related provisions of GDPR, namely a conceptual model and a set of
completeness criteria. Then, we develop an automated solution on top of these
artifacts by leveraging a combination of natural language processing and
supervised machine learning. Specifically, we identify the GDPR-relevant
information content in privacy policies and subsequently check them against the
completeness criteria. To evaluate our approach, we collected 234 real privacy
policies from the fund industry. Over a set of 48 unseen privacy policies, our
approach detected 300 of the total of 334 violations of some completeness
criteria correctly, while producing 23 false positives. The approach thus has a
precision of 92.9% and recall of 89.8%. Compared to a baseline that applies
keyword search only, our approach results in an improvement of 24.5% in
precision and 38% in recall.
Related papers
- C3PA: An Open Dataset of Expert-Annotated and Regulation-Aware Privacy Policies to Enable Scalable Regulatory Compliance Audits [7.1195014414194695]
C3PA is the first regulation-aware dataset of expert-annotated privacy policies.
It contains over 48K expert-labeled privacy policy text segments associated with responses to CCPA-specific disclosure mandates.
arXiv Detail & Related papers (2024-10-04T21:04:39Z) - Interactive GDPR-Compliant Privacy Policy Generation for Software Applications [6.189770781546807]
To use software applications users are sometimes requested to provide their personal information.
As privacy has become a significant concern many protection regulations exist worldwide.
We propose an approach that generates comprehensive and compliant privacy policy.
arXiv Detail & Related papers (2024-10-04T01:22:16Z) - PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action [54.11479432110771]
PrivacyLens is a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories.
We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds.
State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions.
arXiv Detail & Related papers (2024-08-29T17:58:38Z) - Provable Privacy with Non-Private Pre-Processing [56.770023668379615]
We propose a general framework to evaluate the additional privacy cost incurred by non-private data-dependent pre-processing algorithms.
Our framework establishes upper bounds on the overall privacy guarantees by utilising two new technical notions.
arXiv Detail & Related papers (2024-03-19T17:54:49Z) - A Randomized Approach for Tight Privacy Accounting [63.67296945525791]
We propose a new differential privacy paradigm called estimate-verify-release (EVR)
EVR paradigm first estimates the privacy parameter of a mechanism, then verifies whether it meets this guarantee, and finally releases the query output.
Our empirical evaluation shows the newly proposed EVR paradigm improves the utility-privacy tradeoff for privacy-preserving machine learning.
arXiv Detail & Related papers (2023-04-17T00:38:01Z) - Tight Auditing of Differentially Private Machine Learning [77.38590306275877]
For private machine learning, existing auditing mechanisms are tight.
They only give tight estimates under implausible worst-case assumptions.
We design an improved auditing scheme that yields tight privacy estimates for natural (not adversarially crafted) datasets.
arXiv Detail & Related papers (2023-02-15T21:40:33Z) - No Free Lunch in "Privacy for Free: How does Dataset Condensation Help
Privacy" [75.98836424725437]
New methods designed to preserve data privacy require careful scrutiny.
Failure to preserve privacy is hard to detect, and yet can lead to catastrophic results when a system implementing a privacy-preserving'' method is attacked.
arXiv Detail & Related papers (2022-09-29T17:50:23Z) - How to keep text private? A systematic review of deep learning methods
for privacy-preserving natural language processing [0.38073142980732994]
Article systematically reviews over sixty methods for privacy-preserving NLP published between 2016 and 2020.
We introduce a novel taxonomy for classifying the existing methods into three categories: methods trusted methods verification methods.
We discuss open challenges in privacy-preserving NLP regarding data traceability, overhead dataset size and the prevalence of human biases in embeddings.
arXiv Detail & Related papers (2022-05-20T11:29:44Z) - Automated Detection of GDPR Disclosure Requirements in Privacy Policies
using Deep Active Learning [3.659023646021795]
Most privacy policies are verbose, full of jargon, and vaguely describe companies' data practices and users' rights.
In this paper, we create a privacy policy dataset of 1,080 websites labeled with the 18 requirements.
We develop a Convolutional Network (CNN) based model which can classify the privacy policies with an accuracy of 89.2%.
arXiv Detail & Related papers (2021-11-08T01:28:27Z) - Private Reinforcement Learning with PAC and Regret Guarantees [69.4202374491817]
We design privacy preserving exploration policies for episodic reinforcement learning (RL)
We first provide a meaningful privacy formulation using the notion of joint differential privacy (JDP)
We then develop a private optimism-based learning algorithm that simultaneously achieves strong PAC and regret bounds, and enjoys a JDP guarantee.
arXiv Detail & Related papers (2020-09-18T20:18:35Z) - Privacy Policies over Time: Curation and Analysis of a Million-Document
Dataset [6.060757543617328]
We develop a crawler that discovers, downloads, and extracts archived privacy policies from the Internet Archive's Wayback Machine.
We curated a dataset of 1,071,488 English language privacy policies, spanning over two decades and over 130,000 distinct websites.
Our data indicate that self-regulation for first-party websites has stagnated, while self-regulation for third parties has increased but is dominated by online advertising trade associations.
arXiv Detail & Related papers (2020-08-20T19:00:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.