Related papers: Automated Detection and Analysis of Data Practices Using A Real-World Corpus

Automated Detection and Analysis of Data Practices Using A Real-World Corpus

URL: http://arxiv.org/abs/2402.11006v1
Date: Fri, 16 Feb 2024 18:51:40 GMT
Title: Automated Detection and Analysis of Data Practices Using A Real-World Corpus
Authors: Mukund Srinath, Pranav Venkit, Maria Badillo, Florian Schaub, C. Lee Giles, Shomir Wilson
Abstract summary: We propose an automated approach to identify and visualize data practices within privacy policies at different levels of detail. Our approach accurately matches data practice descriptions with policy excerpts, facilitating the presentation of simplified privacy information to users.
Score: 20.4572759138767
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Privacy policies are crucial for informing users about data practices, yet their length and complexity often deter users from reading them. In this paper, we propose an automated approach to identify and visualize data practices within privacy policies at different levels of detail. Leveraging crowd-sourced annotations from the ToS;DR platform, we experiment with various methods to match policy excerpts with predefined data practice descriptions. We further conduct a case study to evaluate our approach on a real-world policy, demonstrating its effectiveness in simplifying complex policies. Experiments show that our approach accurately matches data practice descriptions with policy excerpts, facilitating the presentation of simplified privacy information to users.

Related papers

SySLLM: Generating Synthesized Policy Summaries for Reinforcement Learning Agents Using Large Language Models [10.848775419008442]
We present SySLLM, a novel method that employs synthesis summarization, utilizing large language models' (LLMs) extensive world knowledge and ability to capture patterns. SySLLM summaries are preferred over demonstration-based policy summaries and match or surpass their performance in objective agent identification tasks.
arXiv Detail & Related papers (2025-03-13T16:10:14Z)
Picachv: Formally Verified Data Use Policy Enforcement for Secure Data Analytics [10.630556229470681]
We introduce Picachv, a novel security monitor that automatically enforces data use policies. It works on relational algebra as an abstraction for program semantics, enabling policy enforcement on query plans generated by programs during execution. We integrate Picachv into Polars, a state-of-the-art data analytics framework, and evaluate its performance using the TPC-H benchmark.
arXiv Detail & Related papers (2025-01-17T21:30:55Z)
LLMs for Generalizable Language-Conditioned Policy Learning under Minimal Data Requirements [50.544186914115045]
This paper presents TEDUO, a novel training pipeline for offline language-conditioned policy learning. TEDUO operates on easy-to-obtain, unlabeled datasets and is suited for the so-called in-the-wild evaluation, wherein the agent encounters previously unseen goals and states.
arXiv Detail & Related papers (2024-12-09T18:43:56Z)
Entailment-Driven Privacy Policy Classification with LLMs [3.564208334473993]
We propose a framework to classify paragraphs of privacy policies into meaningful labels that are easily understood by users. Our framework improves the F1 score in average by 11.2%.
arXiv Detail & Related papers (2024-09-25T05:07:05Z)
Privacy Policy Analysis through Prompt Engineering for LLMs [3.059256166047627]
PAPEL (Privacy Policy Analysis through Prompt Engineering for LLMs) is a framework harnessing the power of Large Language Models (LLMs) to automate the analysis of privacy policies. It aims to streamline the extraction, annotation, and summarization of information from these policies, enhancing their accessibility and comprehensibility without requiring additional model training. We demonstrate the effectiveness of PAPEL with two applications: (i) annotation and (ii) contradiction analysis.
arXiv Detail & Related papers (2024-09-23T10:23:31Z)
One-Shot Learning as Instruction Data Prospector for Large Language Models [108.81681547472138]
textscNuggets uses one-shot learning to select high-quality instruction data from extensive datasets. We show that instruction tuning with the top 1% of examples curated by textscNuggets substantially outperforms conventional methods employing the entire dataset.
arXiv Detail & Related papers (2023-12-16T03:33:12Z)
Counterfactual Learning with General Data-generating Policies [3.441021278275805]
We develop an OPE method for a class of full support and deficient support logging policies in contextual-bandit settings. We prove that our method's prediction converges in probability to the true performance of a counterfactual policy as the sample size increases.
arXiv Detail & Related papers (2022-12-04T21:07:46Z)
Data augmentation for efficient learning from parametric experts [88.33380893179697]
We focus on what we call the policy cloning setting, in which we use online or offline queries of an expert to inform the behavior of a student policy. Our approach, augmented policy cloning (APC), uses synthetic states to induce feedback-sensitivity in a region around sampled trajectories. We achieve highly data-efficient transfer of behavior from an expert to a student policy for high-degrees-of-freedom control problems.
arXiv Detail & Related papers (2022-05-23T16:37:16Z)
A Regularized Implicit Policy for Offline Reinforcement Learning [54.7427227775581]
offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment. We propose a framework that supports learning a flexible yet well-regularized fully-implicit policy. Experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs.
arXiv Detail & Related papers (2022-02-19T20:22:04Z)
Privacy-Constrained Policies via Mutual Information Regularized Policy Gradients [54.98496284653234]
We consider the task of training a policy that maximizes reward while minimizing disclosure of certain sensitive state variables through the actions. We solve this problem by introducing a regularizer based on the mutual information between the sensitive state and the actions. We develop a model-based estimator for optimization of privacy-constrained policies.
arXiv Detail & Related papers (2020-12-30T03:22:35Z)
Policy Evaluation Networks [50.53250641051648]
We introduce a scalable, differentiable fingerprinting mechanism that retains essential policy information in a concise embedding. Our empirical results demonstrate that combining these three elements can produce policies that outperform those that generated the training data.
arXiv Detail & Related papers (2020-02-26T23:00:27Z)
A Comparative Study of Sequence Classification Models for Privacy Policy Coverage Analysis [0.0]
Privacy policies are legal documents that describe how a website will collect, use, and distribute a user's data. Our solution is to provide users with a coverage analysis of a given website's privacy policy using a wide range of classical machine learning and deep learning techniques.
arXiv Detail & Related papers (2020-02-12T21:46:22Z)
Reward-Conditioned Policies [100.64167842905069]
imitation learning requires near-optimal expert data. Can we learn effective policies via supervised learning without demonstrations? We show how such an approach can be derived as a principled method for policy search.
arXiv Detail & Related papers (2019-12-31T18:07:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.