Demystifying Legalese: An Automated Approach for Summarizing and Analyzing Overlaps in Privacy Policies and Terms of Service
- URL: http://arxiv.org/abs/2404.13087v1
- Date: Wed, 17 Apr 2024 19:53:59 GMT
- Title: Demystifying Legalese: An Automated Approach for Summarizing and Analyzing Overlaps in Privacy Policies and Terms of Service
- Authors: Shikha Soneji, Mitchell Hoesing, Sujay Koujalgi, Jonathan Dodge,
- Abstract summary: Our work seeks to alleviate this issue by developing language models that provide automated, accessible summaries and scores for such documents.
We compared transformer-based and conventional models during training on our dataset, and RoBERTa performed better overall with a remarkable 0.74 F1-score.
- Score: 0.6240153531166704
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The complexities of legalese in terms and policy documents can bind individuals to contracts they do not fully comprehend, potentially leading to uninformed data sharing. Our work seeks to alleviate this issue by developing language models that provide automated, accessible summaries and scores for such documents, aiming to enhance user understanding and facilitate informed decisions. We compared transformer-based and conventional models during training on our dataset, and RoBERTa performed better overall with a remarkable 0.74 F1-score. Leveraging our best-performing model, RoBERTa, we highlighted redundancies and potential guideline violations by identifying overlaps in GDPR-required documents, underscoring the necessity for stricter GDPR compliance.
Related papers
- Rethinking Legal Compliance Automation: Opportunities with Large Language Models [2.9088208525097365]
We argue that the examination of (textual) legal artifacts should, first employ broader context than sentences.
We present a compliance analysis approach designed to address these limitations.
arXiv Detail & Related papers (2024-04-22T17:10:27Z) - Modelling Technique for GDPR-compliance: Toward a Comprehensive Solution [0.0]
New data protection legislation in the EU/UK has come into force.
Existing threat modelling techniques are not designed to model compliance.
We propose a new data flow integrated with principles of knowledge base for non-compliance threats.
arXiv Detail & Related papers (2024-04-22T08:41:43Z) - Towards an Enforceable GDPR Specification [49.1574468325115]
Privacy by Design (PbD) is prescribed by modern privacy regulations such as the EU's.
One emerging technique to realize PbD is enforcement (RE)
We present a set of requirements and an iterative methodology for creating formal specifications of legal provisions.
arXiv Detail & Related papers (2024-02-27T09:38:51Z) - A Multi-solution Study on GDPR AI-enabled Completeness Checking of DPAs [3.1002416427168304]
General Data Protection Regulation (DPA) requires a data processing agreement (DPA) which regulates processing and ensures personal data remains protected.
Checking completeness of DPA according to prerequisite provisions is therefore an essential to ensure that requirements are complete.
We propose an automation strategy to address the completeness checking of DPAs against stipulated provisions.
arXiv Detail & Related papers (2023-11-23T10:05:52Z) - Information Association for Language Model Updating by Mitigating
LM-Logical Discrepancy [68.31760483418901]
Large Language Models(LLMs) struggle with providing current information due to the outdated pre-training data.
Existing methods for updating LLMs, such as knowledge editing and continual fine-tuning, have significant drawbacks in generalizability of new information.
We identify the core challenge behind these drawbacks: the LM-logical discrepancy featuring the difference between language modeling probabilities and logical probabilities.
arXiv Detail & Related papers (2023-05-29T19:48:37Z) - Auditing and Generating Synthetic Data with Controllable Trust Trade-offs [54.262044436203965]
We introduce a holistic auditing framework that comprehensively evaluates synthetic datasets and AI models.
It focuses on preventing bias and discrimination, ensures fidelity to the source data, assesses utility, robustness, and privacy preservation.
We demonstrate the framework's effectiveness by auditing various generative models across diverse use cases.
arXiv Detail & Related papers (2023-04-21T09:03:18Z) - Relational Action Bases: Formalization, Effective Safety Verification,
and Invariants (Extended Version) [67.99023219822564]
We introduce the general framework of relational action bases (RABs)
RABs generalize existing models by lifting both restrictions.
We demonstrate the effectiveness of this approach on a benchmark of data-aware business processes.
arXiv Detail & Related papers (2022-08-12T17:03:50Z) - SAIS: Supervising and Augmenting Intermediate Steps for Document-Level
Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction.
Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z) - Investigating Crowdsourcing Protocols for Evaluating the Factual
Consistency of Summaries [59.27273928454995]
Current pre-trained models applied to summarization are prone to factual inconsistencies which misrepresent the source text or introduce extraneous information.
We create a crowdsourcing evaluation framework for factual consistency using the rating-based Likert scale and ranking-based Best-Worst Scaling protocols.
We find that ranking-based protocols offer a more reliable measure of summary quality across datasets, while the reliability of Likert ratings depends on the target dataset and the evaluation design.
arXiv Detail & Related papers (2021-09-19T19:05:00Z) - Compliance Generation for Privacy Documents under GDPR: A Roadmap for
Implementing Automation and Machine Learning [2.1485350418225244]
Privatech project focuses on corporations and law firms as agents of compliance.
Data processors must implement accountability measures to assess and document compliance.
We provide a roadmap for compliance assessment and generation by identifying compliance issues.
arXiv Detail & Related papers (2020-12-23T14:46:51Z) - Towards a Semantic Model of the GDPR Register of Processing Activities [0.3441021278275805]
We present a consolidated data model based on common concepts and relationships across analysed templates.
We show that the DPV currently does not provide sufficient concepts to represent the ROPA data model.
This will enable creation of a pan-EU information management framework for interoperability between organisations and regulators for compliance.
arXiv Detail & Related papers (2020-08-03T13:54:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.