C3PA: An Open Dataset of Expert-Annotated and Regulation-Aware Privacy Policies to Enable Scalable Regulatory Compliance Audits
- URL: http://arxiv.org/abs/2410.03925v1
- Date: Fri, 4 Oct 2024 21:04:39 GMT
- Title: C3PA: An Open Dataset of Expert-Annotated and Regulation-Aware Privacy Policies to Enable Scalable Regulatory Compliance Audits
- Authors: Maaz Bin Musa, Steven M. Winston, Garrison Allen, Jacob Schiller, Kevin Moore, Sean Quick, Johnathan Melvin, Padmini Srinivasan, Mihailis E. Diamantis, Rishab Nithyanand,
- Abstract summary: C3PA is the first regulation-aware dataset of expert-annotated privacy policies.
It contains over 48K expert-labeled privacy policy text segments associated with responses to CCPA-specific disclosure mandates.
- Score: 7.1195014414194695
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The development of tools and techniques to analyze and extract organizations data habits from privacy policies are critical for scalable regulatory compliance audits. Unfortunately, these tools are becoming increasingly limited in their ability to identify compliance issues and fixes. After all, most were developed using regulation-agnostic datasets of annotated privacy policies obtained from a time before the introduction of landmark privacy regulations such as EUs GDPR and Californias CCPA. In this paper, we describe the first open regulation-aware dataset of expert-annotated privacy policies, C3PA (CCPA Privacy Policy Provision Annotations), aimed to address this challenge. C3PA contains over 48K expert-labeled privacy policy text segments associated with responses to CCPA-specific disclosure mandates from 411 unique organizations. We demonstrate that the C3PA dataset is uniquely suited for aiding automated audits of compliance with CCPA-related disclosure mandates.
Related papers
- How Privacy-Savvy Are Large Language Models? A Case Study on Compliance and Privacy Technical Review [15.15468770348023]
We evaluate large language models' performance in privacy-related tasks such as privacy information extraction (PIE), legal and regulatory key point detection (KPD), and question answering (QA)
Through an empirical assessment, we investigate the capacity of several prominent LLMs, including BERT, GPT-3.5, GPT-4, and custom models, in executing privacy compliance checks and technical privacy reviews.
While LLMs show promise in automating privacy reviews and identifying regulatory discrepancies, significant gaps persist in their ability to fully comply with evolving legal standards.
arXiv Detail & Related papers (2024-09-04T01:51:37Z) - PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action [54.11479432110771]
PrivacyLens is a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories.
We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds.
State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions.
arXiv Detail & Related papers (2024-08-29T17:58:38Z) - Properties of Effective Information Anonymity Regulations [6.8322083925948185]
We develop a set of technical requirements for anonymization rules and related regulations.
As an exemplar, we evaluate competing interpretations of regulatory requirements from the EU's General Data Protection Regulation.
arXiv Detail & Related papers (2024-08-27T02:34:41Z) - PrivComp-KG : Leveraging Knowledge Graph and Large Language Models for Privacy Policy Compliance Verification [0.0]
We propose a Large Language Model (LLM) and Semantic Web based approach for privacy compliance.
PrivComp-KG is designed to efficiently store and retrieve comprehensive information concerning privacy policies.
It can be queried to check for compliance with privacy policies by each vendor against relevant policy regulations.
arXiv Detail & Related papers (2024-04-30T17:44:44Z) - TeD-SPAD: Temporal Distinctiveness for Self-supervised
Privacy-preservation for video Anomaly Detection [59.04634695294402]
Video anomaly detection (VAD) without human monitoring is a complex computer vision task.
Privacy leakage in VAD allows models to pick up and amplify unnecessary biases related to people's personal information.
We propose TeD-SPAD, a privacy-aware video anomaly detection framework that destroys visual private information in a self-supervised manner.
arXiv Detail & Related papers (2023-08-21T22:42:55Z) - A Randomized Approach for Tight Privacy Accounting [63.67296945525791]
We propose a new differential privacy paradigm called estimate-verify-release (EVR)
EVR paradigm first estimates the privacy parameter of a mechanism, then verifies whether it meets this guarantee, and finally releases the query output.
Our empirical evaluation shows the newly proposed EVR paradigm improves the utility-privacy tradeoff for privacy-preserving machine learning.
arXiv Detail & Related papers (2023-04-17T00:38:01Z) - Tight Auditing of Differentially Private Machine Learning [77.38590306275877]
For private machine learning, existing auditing mechanisms are tight.
They only give tight estimates under implausible worst-case assumptions.
We design an improved auditing scheme that yields tight privacy estimates for natural (not adversarially crafted) datasets.
arXiv Detail & Related papers (2023-02-15T21:40:33Z) - A Fine-grained Chinese Software Privacy Policy Dataset for Sequence
Labeling and Regulation Compliant Identification [23.14031861460124]
We construct the first Chinese privacy policy dataset, CA4P-483, to facilitate the sequence labeling tasks and regulation compliance identification.
Our dataset includes 483 Chinese Android application privacy policies, over 11K sentences, and 52K fine-grained annotations.
arXiv Detail & Related papers (2022-12-04T05:59:59Z) - Having your Privacy Cake and Eating it Too: Platform-supported Auditing
of Social Media Algorithms for Public Interest [70.02478301291264]
Social media platforms curate access to information and opportunities, and so play a critical role in shaping public discourse.
Prior studies have used black-box methods to show that these algorithms can lead to biased or discriminatory outcomes.
We propose a new method for platform-supported auditing that can meet the goals of the proposed legislation.
arXiv Detail & Related papers (2022-07-18T17:32:35Z) - AI-enabled Automation for Completeness Checking of Privacy Policies [7.707284039078785]
In Europe, privacy policies are subject to compliance with the General Data Protection Regulation.
In this paper, we propose AI-based automation for completeness checking privacy policies.
arXiv Detail & Related papers (2021-06-10T12:10:51Z) - Private Reinforcement Learning with PAC and Regret Guarantees [69.4202374491817]
We design privacy preserving exploration policies for episodic reinforcement learning (RL)
We first provide a meaningful privacy formulation using the notion of joint differential privacy (JDP)
We then develop a private optimism-based learning algorithm that simultaneously achieves strong PAC and regret bounds, and enjoys a JDP guarantee.
arXiv Detail & Related papers (2020-09-18T20:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.