NLP-based Automated Compliance Checking of Data Processing Agreements
against GDPR
- URL: http://arxiv.org/abs/2209.09722v2
- Date: Sun, 18 Jun 2023 12:59:12 GMT
- Title: NLP-based Automated Compliance Checking of Data Processing Agreements
against GDPR
- Authors: Orlando Amaral, Muhammad Ilyas Azeem, Sallam Abualhaija and Lionel C
Briand
- Abstract summary: We propose an automated solution to check compliance of a given DPA against the "shall" requirements.
Our approach correctly finds 618 out of 750 genuine violations while raising 76 false violations, and further correctly identifies 524 satisfied requirements.
- Score: 9.022562906627991
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Processing personal data is regulated in Europe by the General Data
Protection Regulation (GDPR) through data processing agreements (DPAs).
Checking the compliance of DPAs contributes to the compliance verification of
software systems as DPAs are an important source of requirements for software
development involving the processing of personal data. However, manually
checking whether a given DPA complies with GDPR is challenging as it requires
significant time and effort for understanding and identifying DPA-relevant
compliance requirements in GDPR and then verifying these requirements in the
DPA. In this paper, we propose an automated solution to check the compliance of
a given DPA against GDPR. In close interaction with legal experts, we first
built two artifacts: (i) the "shall" requirements extracted from the GDPR
provisions relevant to DPA compliance and (ii) a glossary table defining the
legal concepts in the requirements. Then, we developed an automated solution
that leverages natural language processing (NLP) technologies to check the
compliance of a given DPA against these "shall" requirements. Specifically, our
approach automatically generates phrasal-level representations for the textual
content of the DPA and compares it against predefined representations of the
"shall" requirements. Over a dataset of 30 actual DPAs, the approach correctly
finds 618 out of 750 genuine violations while raising 76 false violations, and
further correctly identifies 524 satisfied requirements. The approach has thus
an average precision of 89.1%, a recall of 82.4%, and an accuracy of 84.6%.
Compared to a baseline that relies on off-the-shelf NLP tools, our approach
provides an average accuracy gain of ~20 percentage points. The accuracy of our
approach can be improved to ~94% with limited manual verification effort.
Related papers
- Contrastive Learning to Improve Retrieval for Real-world Fact Checking [84.57583869042791]
We present Contrastive Fact-Checking Reranker (CFR), an improved retriever for fact-checking complex claims.
We leverage the AVeriTeC dataset, which annotates subquestions for claims with human written answers from evidence documents.
We find a 6% improvement in veracity classification accuracy on the dataset.
arXiv Detail & Related papers (2024-10-07T00:09:50Z) - RegNLP in Action: Facilitating Compliance Through Automated Information Retrieval and Answer Generation [51.998738311700095]
Regulatory documents, characterized by their length, complexity and frequent updates, are challenging to interpret.
RegNLP is a multidisciplinary subfield aimed at simplifying access to and interpretation of regulatory rules and obligations.
ObliQA dataset contains 27,869 questions derived from the Abu Dhabi Global Markets (ADGM) financial regulation document collection.
arXiv Detail & Related papers (2024-09-09T14:44:19Z) - Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs [54.05511925104712]
We propose a simple, effective, and data-efficient method called Step-DPO.
Step-DPO treats individual reasoning steps as units for preference optimization rather than evaluating answers holistically.
Our findings demonstrate that as few as 10K preference data pairs and fewer than 500 Step-DPO training steps can yield a nearly 3% gain in accuracy on MATH for models with over 70B parameters.
arXiv Detail & Related papers (2024-06-26T17:43:06Z) - Demystifying Legalese: An Automated Approach for Summarizing and Analyzing Overlaps in Privacy Policies and Terms of Service [0.6240153531166704]
Our work seeks to alleviate this issue by developing language models that provide automated, accessible summaries and scores for such documents.
We compared transformer-based and conventional models during training on our dataset, and RoBERTa performed better overall with a remarkable 0.74 F1-score.
arXiv Detail & Related papers (2024-04-17T19:53:59Z) - Towards an Enforceable GDPR Specification [49.1574468325115]
Privacy by Design (PbD) is prescribed by modern privacy regulations such as the EU's.
One emerging technique to realize PbD is enforcement (RE)
We present a set of requirements and an iterative methodology for creating formal specifications of legal provisions.
arXiv Detail & Related papers (2024-02-27T09:38:51Z) - A Multi-solution Study on GDPR AI-enabled Completeness Checking of DPAs [3.1002416427168304]
General Data Protection Regulation (DPA) requires a data processing agreement (DPA) which regulates processing and ensures personal data remains protected.
Checking completeness of DPA according to prerequisite provisions is therefore an essential to ensure that requirements are complete.
We propose an automation strategy to address the completeness checking of DPAs against stipulated provisions.
arXiv Detail & Related papers (2023-11-23T10:05:52Z) - Better Practices for Domain Adaptation [62.70267990659201]
Domain adaptation (DA) aims to provide frameworks for adapting models to deployment data without using labels.
Unclear validation protocol for DA has led to bad practices in the literature.
We show challenges across all three branches of domain adaptation methodology.
arXiv Detail & Related papers (2023-09-07T17:44:18Z) - AI-enabled Automation for Completeness Checking of Privacy Policies [7.707284039078785]
In Europe, privacy policies are subject to compliance with the General Data Protection Regulation.
In this paper, we propose AI-based automation for completeness checking privacy policies.
arXiv Detail & Related papers (2021-06-10T12:10:51Z) - GDPR: When the Right to Access Personal Data Becomes a Threat [63.732639864601914]
We examine more than 300 data controllers performing for each of them a request to access personal data.
We find that 50.4% of the data controllers that handled the request, have flaws in the procedure of identifying the users.
With the undesired and surprising result that, in its present deployment, has actually decreased the privacy of the users of web services.
arXiv Detail & Related papers (2020-05-04T22:01:46Z) - Machine Understandable Policies and GDPR Compliance Checking [9.032680855473986]
Towards SPECIAL H2020 project aims to provide a set of tools that can be used by data controllers that automatically check if personal data sharing complies with obligations set forth with obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with regulatory obligations set forth with
arXiv Detail & Related papers (2020-01-24T09:41:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.