Automated Detection and Analysis of Data Practices Using A Real-World
Corpus
- URL: http://arxiv.org/abs/2402.11006v1
- Date: Fri, 16 Feb 2024 18:51:40 GMT
- Title: Automated Detection and Analysis of Data Practices Using A Real-World
Corpus
- Authors: Mukund Srinath, Pranav Venkit, Maria Badillo, Florian Schaub, C. Lee
Giles, Shomir Wilson
- Abstract summary: We propose an automated approach to identify and visualize data practices within privacy policies at different levels of detail.
Our approach accurately matches data practice descriptions with policy excerpts, facilitating the presentation of simplified privacy information to users.
- Score: 20.4572759138767
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Privacy policies are crucial for informing users about data practices, yet
their length and complexity often deter users from reading them. In this paper,
we propose an automated approach to identify and visualize data practices
within privacy policies at different levels of detail. Leveraging crowd-sourced
annotations from the ToS;DR platform, we experiment with various methods to
match policy excerpts with predefined data practice descriptions. We further
conduct a case study to evaluate our approach on a real-world policy,
demonstrating its effectiveness in simplifying complex policies. Experiments
show that our approach accurately matches data practice descriptions with
policy excerpts, facilitating the presentation of simplified privacy
information to users.
Related papers
- Customize Multi-modal RAI Guardrails with Precedent-based predictions [55.63757336900865]
A multi-modal guardrail must effectively filter image content based on user-defined policies.<n>Existing fine-tuning methods typically condition predictions on pre-defined policies.<n>We propose to condition model's judgment on "precedents", which are the reasoning processes of prior data points similar to the given input.
arXiv Detail & Related papers (2025-07-28T03:45:34Z) - Let's Measure the Elephant in the Room: Facilitating Personalized Automated Analysis of Privacy Policies at Scale [14.986181740022106]
PoliAnalyzer is a neuro-symbolic system that assists users with personalized privacy policy analysis.<n>It uses Natural Language Processing to extract formal representations of data usage practices from policy texts.<n>It can support automated personalized privacy policy analysis at scale using off-the-shelf NLP tools.
arXiv Detail & Related papers (2025-07-15T20:19:33Z) - DATABench: Evaluating Dataset Auditing in Deep Learning from an Adversarial Perspective [59.66984417026933]
We introduce a novel taxonomy, classifying existing methods based on their reliance on internal features (IF) (inherent to the data) versus external features (EF) (artificially introduced for auditing)<n>We formulate two primary attack types: evasion attacks, designed to conceal the use of a dataset, and forgery attacks, intending to falsely implicate an unused dataset.<n>Building on the understanding of existing methods and attack objectives, we further propose systematic attack strategies: decoupling, removal, and detection for evasion; adversarial example-based methods for forgery.<n>Our benchmark, DATABench, comprises 17 evasion attacks, 5 forgery attacks, and 9
arXiv Detail & Related papers (2025-07-08T03:07:15Z) - SySLLM: Generating Synthesized Policy Summaries for Reinforcement Learning Agents Using Large Language Models [10.848775419008442]
We present SySLLM, a novel method that employs synthesis summarization, utilizing large language models' (LLMs) extensive world knowledge and ability to capture patterns.
SySLLM summaries are preferred over demonstration-based policy summaries and match or surpass their performance in objective agent identification tasks.
arXiv Detail & Related papers (2025-03-13T16:10:14Z) - Picachv: Formally Verified Data Use Policy Enforcement for Secure Data Analytics [10.630556229470681]
We introduce Picachv, a novel security monitor that automatically enforces data use policies.
It works on relational algebra as an abstraction for program semantics, enabling policy enforcement on query plans generated by programs during execution.
We integrate Picachv into Polars, a state-of-the-art data analytics framework, and evaluate its performance using the TPC-H benchmark.
arXiv Detail & Related papers (2025-01-17T21:30:55Z) - LLMs for Generalizable Language-Conditioned Policy Learning under Minimal Data Requirements [50.544186914115045]
This paper presents TEDUO, a novel training pipeline for offline language-conditioned policy learning.
TEDUO operates on easy-to-obtain, unlabeled datasets and is suited for the so-called in-the-wild evaluation, wherein the agent encounters previously unseen goals and states.
arXiv Detail & Related papers (2024-12-09T18:43:56Z) - Entailment-Driven Privacy Policy Classification with LLMs [3.564208334473993]
We propose a framework to classify paragraphs of privacy policies into meaningful labels that are easily understood by users.
Our framework improves the F1 score in average by 11.2%.
arXiv Detail & Related papers (2024-09-25T05:07:05Z) - Privacy Policy Analysis through Prompt Engineering for LLMs [3.059256166047627]
PAPEL (Privacy Policy Analysis through Prompt Engineering for LLMs) is a framework harnessing the power of Large Language Models (LLMs) to automate the analysis of privacy policies.
It aims to streamline the extraction, annotation, and summarization of information from these policies, enhancing their accessibility and comprehensibility without requiring additional model training.
We demonstrate the effectiveness of PAPEL with two applications: (i) annotation and (ii) contradiction analysis.
arXiv Detail & Related papers (2024-09-23T10:23:31Z) - One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets.
We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z) - Counterfactual Learning with General Data-generating Policies [3.441021278275805]
We develop an OPE method for a class of full support and deficient support logging policies in contextual-bandit settings.
We prove that our method's prediction converges in probability to the true performance of a counterfactual policy as the sample size increases.
arXiv Detail & Related papers (2022-12-04T21:07:46Z) - Data augmentation for efficient learning from parametric experts [88.33380893179697]
We focus on what we call the policy cloning setting, in which we use online or offline queries of an expert to inform the behavior of a student policy.
Our approach, augmented policy cloning (APC), uses synthetic states to induce feedback-sensitivity in a region around sampled trajectories.
We achieve highly data-efficient transfer of behavior from an expert to a student policy for high-degrees-of-freedom control problems.
arXiv Detail & Related papers (2022-05-23T16:37:16Z) - A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment.
We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy.
Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z) - Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients [54.98496284653234]
We consider the task of training a policy that maximizes reward while minimizing disclosure of certain sensitive state variables through the actions.
We solve this problem by introducing a regularizer based on the mutual information between the sensitive state and the actions.
We develop a model-based estimator for optimization of privacy-constrained policies.
arXiv Detail & Related papers (2020-12-30T03:22:35Z) - Policy Evaluation Networks [50.53250641051648]
We introduce a scalable, differentiable fingerprinting mechanism that retains essential policy information in a concise embedding.
Our empirical results demonstrate that combining these three elements can produce policies that outperform those that generated the training data.
arXiv Detail & Related papers (2020-02-26T23:00:27Z) - A Comparative Study of Sequence Classification Models for Privacy Policy
Coverage Analysis [0.0]
Privacy policies are legal documents that describe how a website will collect, use, and distribute a user's data.
Our solution is to provide users with a coverage analysis of a given website's privacy policy using a wide range of classical machine learning and deep learning techniques.
arXiv Detail & Related papers (2020-02-12T21:46:22Z) - Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data.
Can we learn effective policies via supervised learning without demonstrations?
We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.