EROS: Entity-Driven Controlled Policy Document Summarization
- URL: http://arxiv.org/abs/2403.00141v1
- Date: Thu, 29 Feb 2024 21:44:50 GMT
- Title: EROS: Entity-Driven Controlled Policy Document Summarization
- Authors: Joykirat Singh, Sehban Fazili, Rohan Jain, Md Shad Akhtar
- Abstract summary: We propose to enhance the interpretability and readability of policy documents by using controlled abstractive summarization.
We develop PD-Sum, a policy-document summarization dataset with marked privacy-related entity labels.
Our proposed model, EROS, identifies critical entities through a span-based entity extraction model and employs them to control the information content of the summaries.
- Score: 16.661448437719464
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Privacy policy documents have a crucial role in educating individuals about
the collection, usage, and protection of users' personal data by organizations.
However, they are notorious for their lengthy, complex, and convoluted language
especially involving privacy-related entities. Hence, they pose a significant
challenge to users who attempt to comprehend organization's data usage policy.
In this paper, we propose to enhance the interpretability and readability of
policy documents by using controlled abstractive summarization -- we enforce
the generated summaries to include critical privacy-related entities (e.g.,
data and medium) and organization's rationale (e.g.,target and reason) in
collecting those entities. To achieve this, we develop PD-Sum, a
policy-document summarization dataset with marked privacy-related entity
labels. Our proposed model, EROS, identifies critical entities through a
span-based entity extraction model and employs them to control the information
content of the summaries using proximal policy optimization (PPO). Comparison
shows encouraging improvement over various baselines. Furthermore, we furnish
qualitative and human evaluations to establish the efficacy of EROS.
Related papers
- Robust Utility-Preserving Text Anonymization Based on Large Language Models [80.5266278002083]
Text anonymization is crucial for sharing sensitive data while maintaining privacy.
Existing techniques face the emerging challenges of re-identification attack ability of Large Language Models.
This paper proposes a framework composed of three LLM-based components -- a privacy evaluator, a utility evaluator, and an optimization component.
arXiv Detail & Related papers (2024-07-16T14:28:56Z) - Collection, usage and privacy of mobility data in the enterprise and public administrations [55.2480439325792]
Security measures such as anonymization are needed to protect individuals' privacy.
Within our study, we conducted expert interviews to gain insights into practices in the field.
We survey privacy-enhancing methods in use, which generally do not comply with state-of-the-art standards of differential privacy.
arXiv Detail & Related papers (2024-07-04T08:29:27Z) - Cloaked Classifiers: Pseudonymization Strategies on Sensitive Classification Tasks [4.66054169739129]
In this paper, we explore the balance between preserving data usefulness and ensuring robust privacy safeguards.
We share our method for manually pseudonymizing a multilingual radicalization dataset, ensuring performance comparable to the original data.
arXiv Detail & Related papers (2024-06-25T18:30:25Z) - Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit.
We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z) - The Privacy Policy Permission Model: A Unified View of Privacy Policies [0.5371337604556311]
A privacy policy is a set of statements that specifies how an organization gathers, uses, discloses, and maintains a client's data.
Most privacy policies lack a clear, complete explanation of how data providers' information is used.
We propose a modeling methodology, called the Privacy Policy Permission Model (PPPM), that provides a uniform, easy-to-understand representation of privacy policies.
arXiv Detail & Related papers (2024-03-26T06:12:38Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Retrieval Enhanced Data Augmentation for Question Answering on Privacy
Policies [74.01792675564218]
We develop a data augmentation framework based on ensembling retriever models that captures relevant text segments from unlabeled policy documents.
To improve the diversity and quality of the augmented data, we leverage multiple pre-trained language models (LMs) and cascade them with noise reduction filter models.
Using our augmented data on the PrivacyQA benchmark, we elevate the existing baseline by a large margin (10% F1) and achieve a new state-of-the-art F1 score of 50%.
arXiv Detail & Related papers (2022-04-19T15:45:23Z) - Decision Making with Differential Privacy under a Fairness Lens [44.4747903763245]
The U.S. Census Bureau releases data sets and statistics about groups of individuals that are used as input to a number of critical decision processes.
To conform to privacy and confidentiality requirements, these agencies are often required to release privacy-preserving versions of the data.
This paper studies the release of differentially private data sets and analyzes their impact on some critical resource allocation tasks under a fairness perspective.
arXiv Detail & Related papers (2021-05-16T21:04:19Z) - Second layer data governance for permissioned blockchains: the privacy
management challenge [58.720142291102135]
In pandemic situations, such as the COVID-19 and Ebola outbreak, the action related to sharing health data is crucial to avoid the massive infection and decrease the number of deaths.
In this sense, permissioned blockchain technology emerges to empower users to get their rights providing data ownership, transparency, and security through an immutable, unified, and distributed database ruled by smart contracts.
arXiv Detail & Related papers (2020-10-22T13:19:38Z) - Beyond The Text: Analysis of Privacy Statements through Syntactic and
Semantic Role Labeling [12.74252812104216]
This paper formulates a new task of extracting privacy parameters from a privacy policy, through the lens of Contextual Integrity.
We show that traditional NLP tasks, including the recently proposed Question-Answering based solutions, are insufficient to address the privacy parameter extraction problem.
arXiv Detail & Related papers (2020-10-01T20:48:37Z) - APPCorp: A Corpus for Android Privacy Policy Document Structure Analysis [16.618995752616296]
In this work we create a manually labelled corpus containing $167$ privacy policies.
We report the annotation process and details of the annotated corpus.
We benchmark our data corpus with $4$ document classification models, thoroughly analyze the results and discuss challenges and opportunities for the research committee to use the corpus.
arXiv Detail & Related papers (2020-05-14T13:25:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.