Related papers: PoliGraph: Automated Privacy Policy Analysis using Knowledge Graphs (Journal Version)

PoliGraph: Automated Privacy Policy Analysis using Knowledge Graphs (Journal Version)

URL: http://arxiv.org/abs/2210.06746v3
Date: Thu, 06 Mar 2025 06:47:40 GMT
Title: PoliGraph: Automated Privacy Policy Analysis using Knowledge Graphs (Journal Version)
Authors: Hao Cui, Rahmadi Trimananda, Scott Jordan, Athina Markopoulou,
Abstract summary: We view and analyze, for the first time, the entire text of a privacy policy in an integrated way.<n>We develop PoliGraph, an NLP tool to automatically extract PoliGraph from the text using linguistic analysis.<n>Using a public dataset for evaluation, we show that PoliGrapher identifies 40% more collection statements than prior state-of-the-art, with 97% precision.
Score: 7.10483762466065
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Privacy policies disclose how an organization collects and handles personal information. Recent work has made progress in leveraging natural language processing (NLP) to automate privacy policy analysis and extract data collection statements from different sentences, considered in isolation from each other. In this paper, we view and analyze, for the first time, the entire text of a privacy policy in an integrated way. In terms of methodology: (1) we define PoliGraph, a type of knowledge graph that captures statements in a policy as relations between different parts of the text; and (2) we revisit the notion of ontologies, previously defined in heuristic ways, to capture subsumption relations between terms. We make a clear distinction between local and global ontologies to capture the context of individual policies, application domains, and privacy laws. We develop PoliGrapher, an NLP tool to automatically extract PoliGraph from the text using linguistic analysis. Using a public dataset for evaluation, we show that PoliGrapher identifies 40% more collection statements than prior state-of-the-art, with 97% precision. In terms of applications, PoliGraph enables automated analysis of a corpus of policies and allows us to: (1) reveal common patterns in the texts across different policies, and (2) assess the correctness of the terms as defined within a policy. We also apply PoliGraph to: (3) detect contradictions in a policy, where we show false alarms by prior work, and (4) analyze the consistency of policies and network traffic, where we identify significantly more clear disclosures than prior work. Finally, leveraging the capabilities of the emerging large language models (LLMs), we also present PoliGrapher-LM, a tool that uses LLM prompting instead of NLP linguistic analysis, to extract PoliGraph from the policy text, and we show that it further improves coverage.

Related papers

DGP: A Dual-Granularity Prompting Framework for Fraud Detection with Graph-Enhanced LLMs [55.13817504780764]
Real-world fraud detection applications benefit from graph learning techniques that jointly exploit node features, often rich in textual data, and graph structural information.<n>Graph-Enhanced LLMs emerge as a promising graph learning approach that converts graph information into prompts.<n>We propose Dual Granularity Prompting (DGP), which mitigates information overload by preserving fine-grained textual details for the target node.
arXiv Detail & Related papers (2025-07-29T10:10:47Z)
Let's Measure the Elephant in the Room: Facilitating Personalized Automated Analysis of Privacy Policies at Scale [14.986181740022106]
PoliAnalyzer is a neuro-symbolic system that assists users with personalized privacy policy analysis.<n>It uses Natural Language Processing to extract formal representations of data usage practices from policy texts.<n>It can support automated personalized privacy policy analysis at scale using off-the-shelf NLP tools.
arXiv Detail & Related papers (2025-07-15T20:19:33Z)
Unveiling Privacy Policy Complexity: An Exploratory Study Using Graph Mining, Machine Learning, and Natural Language Processing [0.13124513975412253]
This study explores the potential of interactive graph visualizations to enhance user understanding of privacy policies.<n>We employ graph mining algorithms to identify key themes, such as User Activity and Device Information.<n>Our findings reveal that graph-based clustering improves policy content interpretability.
arXiv Detail & Related papers (2025-06-30T14:55:57Z)
Few-shot Policy (de)composition in Conversational Question Answering [54.259440408606515]
We propose a neuro-symbolic framework to detect policy compliance using large language models (LLMs) in a few-shot setting. We show that our approach soundly reasons about policy compliance conversations by extracting sub-questions to be answered, assigning truth values from contextual information, and explicitly producing a set of logic statements from the given policies. We apply this approach to the popular PCD and conversational machine reading benchmark, ShARC, and show competitive performance with no task-specific finetuning.
arXiv Detail & Related papers (2025-01-20T08:40:15Z)
Privacy Policy Analysis through Prompt Engineering for LLMs [3.059256166047627]
PAPEL (Privacy Policy Analysis through Prompt Engineering for LLMs) is a framework harnessing the power of Large Language Models (LLMs) to automate the analysis of privacy policies. It aims to streamline the extraction, annotation, and summarization of information from these policies, enhancing their accessibility and comprehensibility without requiring additional model training. We demonstrate the effectiveness of PAPEL with two applications: (i) annotation and (ii) contradiction analysis.
arXiv Detail & Related papers (2024-09-23T10:23:31Z)
PolicyLR: A Logic Representation For Privacy Policies [34.73520882451813]
We propose PolicyLR, a new paradigm that offers a comprehensive machine-readable representation of privacy policies. PolicyLR converts privacy policies into a machine-readable format using valuations of atomic formulae. We demonstrate PolicyLR in three privacy tasks: Policy Compliance, Inconsistency Detection and Privacy Comparison Shopping.
arXiv Detail & Related papers (2024-08-27T07:27:16Z)
{A New Hope}: Contextual Privacy Policies for Mobile Applications and An Approach Toward Automated Generation [19.578130824867596]
The aim of contextual privacy policies ( CPPs) is to fragment privacy policies into concise snippets, displaying them only within the corresponding contexts within the application's graphical user interfaces (GUIs) In this paper, we first formulate CPP in mobile application scenario, and then present a novel multimodal framework, named SeePrivacy, specifically designed to automatically generate CPPs for mobile applications. A human evaluation shows that 77% of the extracted privacy policy segments were perceived as well-aligned with the detected contexts.
arXiv Detail & Related papers (2024-02-22T13:32:33Z)
PolicyGPT: Automated Analysis of Privacy Policies with Large Language Models [41.969546784168905]
In practical use, users tend to click the Agree button directly rather than reading them carefully. This practice exposes users to risks of privacy leakage and legal issues. Recently, the advent of Large Language Models (LLM) such as ChatGPT and GPT-4 has opened new possibilities for text analysis.
arXiv Detail & Related papers (2023-09-19T01:22:42Z)
Privacy-Preserving Graph Machine Learning from Data to Computation: A Survey [67.7834898542701]
We focus on reviewing privacy-preserving techniques of graph machine learning. We first review methods for generating privacy-preserving graph data. Then we describe methods for transmitting privacy-preserved information.
arXiv Detail & Related papers (2023-07-10T04:30:23Z)
Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control [58.06223121654735]
We show a method that taps into joint image- and goal- conditioned policies with language using only a small amount of language data. Our method achieves robust performance in the real world by learning an embedding from the labeled data that aligns language not to the goal image. We show instruction following across a variety of manipulation tasks in different scenes, with generalization to language instructions outside of the labeled data.
arXiv Detail & Related papers (2023-06-30T20:09:39Z)
Natural Language Processing for Policymaking [34.93331735602826]
Natural language processing (NLP) uses computational tools to parse text into key information needed for policymaking. We introduce common methods of NLP, including text classification, topic modeling, event extraction, and text scaling. We highlight some potential limitations and ethical concerns when using NLP for policymaking.
arXiv Detail & Related papers (2023-02-07T14:34:39Z)
PLUE: Language Understanding Evaluation Benchmark for Privacy Policies in English [77.79102359580702]
We introduce the Privacy Policy Language Understanding Evaluation benchmark, a multi-task benchmark for evaluating the privacy policy language understanding. We also collect a large corpus of privacy policies to enable privacy policy domain-specific language model pre-training. We demonstrate that domain-specific continual pre-training offers performance improvements across all tasks.
arXiv Detail & Related papers (2022-12-20T05:58:32Z)
Supervised Off-Policy Ranking [145.3039527243585]
Off-policy evaluation (OPE) leverages data generated by other policies to evaluate a target policy. We propose supervised off-policy ranking that learns a policy scoring model by correctly ranking training policies with known performance. Our method outperforms strong baseline OPE methods in terms of both rank correlation and performance gap between the truly best and the best of the ranked top three policies.
arXiv Detail & Related papers (2021-07-03T07:01:23Z)
Natural Language Specification of Reinforcement Learning Policies through Differentiable Decision Trees [10.406631494442683]
Human-AI policy specification is a novel procedure we define in which humans can collaboratively warm-start a robot's reinforcement learning policy. We develop a novel collaborative framework to allow humans to initialize and interpret an autonomous agent's behavior. Our approach warm-starts an RL agent by utilizing non-expert natural language specifications without incurring the additional domain exploration costs.
arXiv Detail & Related papers (2021-01-18T16:07:00Z)
Policy Evaluation Networks [50.53250641051648]
We introduce a scalable, differentiable fingerprinting mechanism that retains essential policy information in a concise embedding. Our empirical results demonstrate that combining these three elements can produce policies that outperform those that generated the training data.
arXiv Detail & Related papers (2020-02-26T23:00:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.