Related papers: NLP-based Automated Compliance Checking of Data Processing Agreements against GDPR

NLP-based Automated Compliance Checking of Data Processing Agreements against GDPR

URL: http://arxiv.org/abs/2209.09722v2
Date: Sun, 18 Jun 2023 12:59:12 GMT
Title: NLP-based Automated Compliance Checking of Data Processing Agreements against GDPR
Authors: Orlando Amaral, Muhammad Ilyas Azeem, Sallam Abualhaija and Lionel C Briand
Abstract summary: We propose an automated solution to check compliance of a given DPA against the "shall" requirements. Our approach correctly finds 618 out of 750 genuine violations while raising 76 false violations, and further correctly identifies 524 satisfied requirements.
Score: 9.022562906627991
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Processing personal data is regulated in Europe by the General Data Protection Regulation (GDPR) through data processing agreements (DPAs). Checking the compliance of DPAs contributes to the compliance verification of software systems as DPAs are an important source of requirements for software development involving the processing of personal data. However, manually checking whether a given DPA complies with GDPR is challenging as it requires significant time and effort for understanding and identifying DPA-relevant compliance requirements in GDPR and then verifying these requirements in the DPA. In this paper, we propose an automated solution to check the compliance of a given DPA against GDPR. In close interaction with legal experts, we first built two artifacts: (i) the "shall" requirements extracted from the GDPR provisions relevant to DPA compliance and (ii) a glossary table defining the legal concepts in the requirements. Then, we developed an automated solution that leverages natural language processing (NLP) technologies to check the compliance of a given DPA against these "shall" requirements. Specifically, our approach automatically generates phrasal-level representations for the textual content of the DPA and compares it against predefined representations of the "shall" requirements. Over a dataset of 30 actual DPAs, the approach correctly finds 618 out of 750 genuine violations while raising 76 false violations, and further correctly identifies 524 satisfied requirements. The approach has thus an average precision of 89.1%, a recall of 82.4%, and an accuracy of 84.6%. Compared to a baseline that relies on off-the-shelf NLP tools, our approach provides an average accuracy gain of ~20 percentage points. The accuracy of our approach can be improved to ~94% with limited manual verification effort.

Related papers

Lawful and Accountable Personal Data Processing with GDPR-based Access and Usage Control in Distributed Systems [0.0]
This paper proposes a case-generic method for automated normative reasoning that establishes legal arguments for the lawfulness of data processing activities. The arguments are established on the basis of case-specific legal qualifications made by privacy experts, bringing the human in the loop. The resulting system is designed and critically assessed in reference to requirements extracted from the GPDR.
arXiv Detail & Related papers (2025-03-10T10:49:34Z)
Do Not Trust Licenses You See: Dataset Compliance Requires Massive-Scale AI-Powered Lifecycle Tracing [45.6582862121583]
This paper argues that a dataset's legal risk cannot be accurately assessed by its license terms alone. It argues that tracking dataset redistribution and its full lifecycle is essential. We show that AI can perform these tasks with higher accuracy, efficiency, and cost-effectiveness than human experts.
arXiv Detail & Related papers (2025-03-04T16:57:53Z)
PIPA: Preference Alignment as Prior-Informed Statistical Estimation [57.24096291517857]
We introduce Pior-Informed Preference Alignment (PIPA), a unified, RL-free probabilistic framework. PIPA accommodates both paired and unpaired data, as well as answer and step-level annotations. By integrating different types of prior information, we developed two variations of PIPA: PIPA-M and PIPA-N.
arXiv Detail & Related papers (2025-02-09T04:31:30Z)
Contrastive Learning to Improve Retrieval for Real-world Fact Checking [84.57583869042791]
We present Contrastive Fact-Checking Reranker (CFR), an improved retriever for fact-checking complex claims. We leverage the AVeriTeC dataset, which annotates subquestions for claims with human written answers from evidence documents. We find a 6% improvement in veracity classification accuracy on the dataset.
arXiv Detail & Related papers (2024-10-07T00:09:50Z)
RegNLP in Action: Facilitating Compliance Through Automated Information Retrieval and Answer Generation [51.998738311700095]
Regulatory documents, characterized by their length, complexity and frequent updates, are challenging to interpret. RegNLP is a multidisciplinary subfield aimed at simplifying access to and interpretation of regulatory rules and obligations. ObliQA dataset contains 27,869 questions derived from the Abu Dhabi Global Markets (ADGM) financial regulation document collection.
arXiv Detail & Related papers (2024-09-09T14:44:19Z)
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs [54.05511925104712]
We propose a simple, effective, and data-efficient method called Step-DPO. Step-DPO treats individual reasoning steps as units for preference optimization rather than evaluating answers holistically. Our findings demonstrate that as few as 10K preference data pairs and fewer than 500 Step-DPO training steps can yield a nearly 3% gain in accuracy on MATH for models with over 70B parameters.
arXiv Detail & Related papers (2024-06-26T17:43:06Z)
Demystifying Legalese: An Automated Approach for Summarizing and Analyzing Overlaps in Privacy Policies and Terms of Service [0.6240153531166704]
Our work seeks to alleviate this issue by developing language models that provide automated, accessible summaries and scores for such documents. We compared transformer-based and conventional models during training on our dataset, and RoBERTa performed better overall with a remarkable 0.74 F1-score.
arXiv Detail & Related papers (2024-04-17T19:53:59Z)
Towards an Enforceable GDPR Specification [49.1574468325115]
Privacy by Design (PbD) is prescribed by modern privacy regulations such as the EU's. One emerging technique to realize PbD is enforcement (RE) We present a set of requirements and an iterative methodology for creating formal specifications of legal provisions.
arXiv Detail & Related papers (2024-02-27T09:38:51Z)
A Multi-solution Study on GDPR AI-enabled Completeness Checking of DPAs [3.1002416427168304]
General Data Protection Regulation (DPA) requires a data processing agreement (DPA) which regulates processing and ensures personal data remains protected. Checking completeness of DPA according to prerequisite provisions is therefore an essential to ensure that requirements are complete. We propose an automation strategy to address the completeness checking of DPAs against stipulated provisions.
arXiv Detail & Related papers (2023-11-23T10:05:52Z)
Better Practices for Domain Adaptation [62.70267990659201]
Domain adaptation (DA) aims to provide frameworks for adapting models to deployment data without using labels. Unclear validation protocol for DA has led to bad practices in the literature. We show challenges across all three branches of domain adaptation methodology.
arXiv Detail & Related papers (2023-09-07T17:44:18Z)
AI-enabled Automation for Completeness Checking of Privacy Policies [7.707284039078785]
In Europe, privacy policies are subject to compliance with the General Data Protection Regulation. In this paper, we propose AI-based automation for completeness checking privacy policies.
arXiv Detail & Related papers (2021-06-10T12:10:51Z)
GDPR: When the Right to Access Personal Data Becomes a Threat [63.732639864601914]
We examine more than 300 data controllers performing for each of them a request to access personal data. We find that 50.4% of the data controllers that handled the request, have flaws in the procedure of identifying the users. With the undesired and surprising result that, in its present deployment, has actually decreased the privacy of the users of web services.
arXiv Detail & Related papers (2020-05-04T22:01:46Z)
Machine Understandable Policies and GDPR Compliance Checking [9.032680855473986]
Towards SPECIAL H2020 project aims to provide a set of tools that can be used by data controllers that automatically check if personal data sharing complies with obligations set forth with obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with
arXiv Detail & Related papers (2020-01-24T09:41:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.