Intent Classification and Slot Filling for Privacy Policies
- URL: http://arxiv.org/abs/2101.00123v1
- Date: Fri, 1 Jan 2021 00:44:41 GMT
- Title: Intent Classification and Slot Filling for Privacy Policies
- Authors: Wasi Uddin Ahmad, Jianfeng Chi, Tu Le, Thomas Norton, Yuan Tian,
Kai-Wei Chang
- Abstract summary: PolicyIE is a corpus consisting of 5,250 intent and 11,788 slot annotations spanning 31 privacy policies of websites and mobile applications.
We present two alternative neural approaches as baselines: (1) formulating intent classification and slot filling as a joint sequence tagging and (2) modeling them as a sequence-to-sequence learning task.
- Score: 34.606121042708864
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding privacy policies is crucial for users as it empowers them to
learn about the information that matters to them. Sentences written in a
privacy policy document explain privacy practices, and the constituent text
spans convey further specific information about that practice. We refer to
predicting the privacy practice explained in a sentence as intent
classification and identifying the text spans sharing specific information as
slot filling. In this work, we propose PolicyIE, a corpus consisting of 5,250
intent and 11,788 slot annotations spanning 31 privacy policies of websites and
mobile applications. PolicyIE corpus is a challenging benchmark with limited
labeled examples reflecting the cost of collecting large-scale annotations. We
present two alternative neural approaches as baselines: (1) formulating intent
classification and slot filling as a joint sequence tagging and (2) modeling
them as a sequence-to-sequence (Seq2Seq) learning task. Experiment results show
that both approaches perform comparably in intent classification, while the
Seq2Seq method outperforms the sequence tagging approach in slot filling by a
large margin. Error analysis reveals the deficiency of the baseline approaches,
suggesting room for improvement in future works. We hope the PolicyIE corpus
will stimulate future research in this domain.
Related papers
- Differential Privacy Overview and Fundamental Techniques [63.0409690498569]
This chapter is meant to be part of the book "Differential Privacy in Artificial Intelligence: From Theory to Practice"
It starts by illustrating various attempts to protect data privacy, emphasizing where and why they failed.
It then defines the key actors, tasks, and scopes that make up the domain of privacy-preserving data analysis.
arXiv Detail & Related papers (2024-11-07T13:52:11Z) - Preserving Node-level Privacy in Graph Neural Networks [8.823710998526705]
We propose a solution that addresses the issue of node-level privacy in Graph Neural Networks (GNNs)
Our protocol consists of two main components: 1) a sampling routine called HeterPoisson, which employs a specialized node sampling strategy and a series of tailored operations to generate a batch of sub-graphs with desired properties, and 2) a randomization routine that utilizes symmetric Laplace noise instead of the commonly used Gaussian noise.
Our protocol enables GNN learning with good performance, as demonstrated by experiments on five real-world datasets.
arXiv Detail & Related papers (2023-11-12T16:21:29Z) - PolicyGPT: Automated Analysis of Privacy Policies with Large Language
Models [41.969546784168905]
In practical use, users tend to click the Agree button directly rather than reading them carefully.
This practice exposes users to risks of privacy leakage and legal issues.
Recently, the advent of Large Language Models (LLM) such as ChatGPT and GPT-4 has opened new possibilities for text analysis.
arXiv Detail & Related papers (2023-09-19T01:22:42Z) - SeePrivacy: Automated Contextual Privacy Policy Generation for Mobile
Applications [21.186902172367173]
SeePrivacy is designed to automatically generate contextual privacy policies for mobile apps.
Our method synergistically combines mobile GUI understanding and privacy policy document analysis.
96% of the retrieved policy segments can be correctly matched with their contexts.
arXiv Detail & Related papers (2023-07-04T12:52:45Z) - PLUE: Language Understanding Evaluation Benchmark for Privacy Policies
in English [77.79102359580702]
We introduce the Privacy Policy Language Understanding Evaluation benchmark, a multi-task benchmark for evaluating the privacy policy language understanding.
We also collect a large corpus of privacy policies to enable privacy policy domain-specific language model pre-training.
We demonstrate that domain-specific continual pre-training offers performance improvements across all tasks.
arXiv Detail & Related papers (2022-12-20T05:58:32Z) - Retrieval Enhanced Data Augmentation for Question Answering on Privacy
Policies [74.01792675564218]
We develop a data augmentation framework based on ensembling retriever models that captures relevant text segments from unlabeled policy documents.
To improve the diversity and quality of the augmented data, we leverage multiple pre-trained language models (LMs) and cascade them with noise reduction filter models.
Using our augmented data on the PrivacyQA benchmark, we elevate the existing baseline by a large margin (10% F1) and achieve a new state-of-the-art F1 score of 50%.
arXiv Detail & Related papers (2022-04-19T15:45:23Z) - Weakly-Supervised Aspect-Based Sentiment Analysis via Joint
Aspect-Sentiment Topic Embedding [71.2260967797055]
We propose a weakly-supervised approach for aspect-based sentiment analysis.
We learn sentiment, aspect> joint topic embeddings in the word embedding space.
We then use neural models to generalize the word-level discriminative information.
arXiv Detail & Related papers (2020-10-13T21:33:24Z) - PolicyQA: A Reading Comprehension Dataset for Privacy Policies [77.79102359580702]
We present PolicyQA, a dataset that contains 25,017 reading comprehension style examples curated from an existing corpus of 115 website privacy policies.
We evaluate two existing neural QA models and perform rigorous analysis to reveal the advantages and challenges offered by PolicyQA.
arXiv Detail & Related papers (2020-10-06T09:04:58Z) - APPCorp: A Corpus for Android Privacy Policy Document Structure Analysis [16.618995752616296]
In this work we create a manually labelled corpus containing $167$ privacy policies.
We report the annotation process and details of the annotated corpus.
We benchmark our data corpus with $4$ document classification models, thoroughly analyze the results and discuss challenges and opportunities for the research committee to use the corpus.
arXiv Detail & Related papers (2020-05-14T13:25:11Z) - Privacy at Scale: Introducing the PrivaSeer Corpus of Web Privacy Policies [13.09699710197036]
We create PrivaSeer, a corpus of over one million English language website privacy policies.
We show results from readability tests, document similarity, keyphrase extraction, and explored the corpus through topic modeling.
arXiv Detail & Related papers (2020-04-23T13:21:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.