NL2GDPR: Automatically Develop GDPR Compliant Android Application
Features from Natural Language
- URL: http://arxiv.org/abs/2208.13361v1
- Date: Mon, 29 Aug 2022 04:16:50 GMT
- Title: NL2GDPR: Automatically Develop GDPR Compliant Android Application
Features from Natural Language
- Authors: Faysal Hossain Shezan, Yingjie Lao, Minlong Peng, Xin Wang, Mingming
Sun, Ping Li
- Abstract summary: NL2 is an information extraction tool developed by Baidu Cognitive Computing Lab.
It generates privacycentric information and generating privacy policies.
It can achieve 92.9% identification of policies related to personal storage process, data process, and types respectively.
- Score: 28.51179772165298
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The recent privacy leakage incidences and the more strict policy regulations
demand a much higher standard of compliance for companies and mobile apps.
However, such obligations also impose significant challenges on app developers
for complying with these regulations that contain various perspectives,
activities, and roles, especially for small companies and developers who are
less experienced in this matter or with limited resources. To address these
hurdles, we develop an automatic tool, NL2GDPR, which can generate policies
from natural language descriptions from the developer while also ensuring the
app's functionalities are compliant with General Data Protection Regulation
(GDPR). NL2GDPR is developed by leveraging an information extraction tool, OIA
(Open Information Annotation), developed by Baidu Cognitive Computing Lab.
At the core, NL2GDPR is a privacy-centric information extraction model,
appended with a GDPR policy finder and a policy generator. We perform a
comprehensive study to grasp the challenges in extracting privacy-centric
information and generating privacy policies, while exploiting optimizations for
this specific task. With NL2GDPR, we can achieve 92.9%, 95.2%, and 98.4%
accuracy in correctly identifying GDPR policies related to personal data
storage, process, and share types, respectively. To the best of our knowledge,
NL2GDPR is the first tool that allows a developer to automatically generate
GDPR compliant policies, with only the need of entering the natural language
for describing the app features. Note that other non-GDPR-related features
might be integrated with the generated features to build a complex app.
Related papers
- Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data [51.41288763521186]
Retrieval-augmented generation (RAG) enhances the outputs of language models by integrating relevant information retrieved from external knowledge sources.
RAG systems may face severe privacy risks when retrieving private data.
We propose using synthetic data as a privacy-preserving alternative for the retrieval data.
arXiv Detail & Related papers (2024-06-20T22:53:09Z) - An Exploratory Mixed-Methods Study on General Data Protection Regulation (GDPR) Compliance in Open-Source Software [4.2610816955137]
European Union's General Data Protection Regulation require software developers to meet privacy requirements interacting with users' data.
Prior research describes impact of such laws on development, but only when commercial software.
arXiv Detail & Related papers (2024-06-20T20:38:33Z) - Demystifying Legalese: An Automated Approach for Summarizing and Analyzing Overlaps in Privacy Policies and Terms of Service [0.6240153531166704]
Our work seeks to alleviate this issue by developing language models that provide automated, accessible summaries and scores for such documents.
We compared transformer-based and conventional models during training on our dataset, and RoBERTa performed better overall with a remarkable 0.74 F1-score.
arXiv Detail & Related papers (2024-04-17T19:53:59Z) - Towards an Enforceable GDPR Specification [49.1574468325115]
Privacy by Design (PbD) is prescribed by modern privacy regulations such as the EU's.
One emerging technique to realize PbD is enforcement (RE)
We present a set of requirements and an iterative methodology for creating formal specifications of legal provisions.
arXiv Detail & Related papers (2024-02-27T09:38:51Z) - Helping Code Reviewer Prioritize: Pinpointing Personal Data and its
Processing [0.9238700679836852]
We have designed two specialized views to help code reviewers in prioritizing their work related to personal data.
Our approach, evaluated on four open-source GitHub applications, demonstrated a precision rate of 0.87 in identifying personal data flows.
This solution, designed to augment the efficiency of privacy-related analysis tasks such as the Record of Processing Activities (ROPA), aims to conserve resources, thereby saving time and enhancing productivity for code reviewers.
arXiv Detail & Related papers (2023-06-20T12:30:46Z) - Privacy Adhering Machine Un-learning in NLP [66.17039929803933]
In real world industry use Machine Learning to build models on user data.
Such mandates require effort both in terms of data as well as model retraining.
continuous removal of data and model retraining steps do not scale.
We propose textitMachine Unlearning to tackle this challenge.
arXiv Detail & Related papers (2022-12-19T16:06:45Z) - Automated Detection of GDPR Disclosure Requirements in Privacy Policies
using Deep Active Learning [3.659023646021795]
Most privacy policies are verbose, full of jargon, and vaguely describe companies' data practices and users' rights.
In this paper, we create a privacy policy dataset of 1,080 websites labeled with the 18 requirements.
We develop a Convolutional Network (CNN) based model which can classify the privacy policies with an accuracy of 89.2%.
arXiv Detail & Related papers (2021-11-08T01:28:27Z) - Building a Foundation for Data-Driven, Interpretable, and Robust Policy
Design using the AI Economist [67.08543240320756]
We show that the AI Economist framework enables effective, flexible, and interpretable policy design using two-level reinforcement learning and data-driven simulations.
We find that log-linear policies trained using RL significantly improve social welfare, based on both public health and economic outcomes, compared to past outcomes.
arXiv Detail & Related papers (2021-08-06T01:30:41Z) - Incorporating External Knowledge through Pre-training for Natural
Language to Code Generation [97.97049697457425]
Open-domain code generation aims to generate code in a general-purpose programming language from natural language (NL) intents.
We explore the effectiveness of incorporating two varieties of external knowledge into NL-to-code generation: automatically mined NL-code pairs from the online programming QA forum StackOverflow and programming language API documentation.
Our evaluations show that combining the two sources with data augmentation and retrieval-based data re-sampling improves the current state-of-the-art by up to 2.2% absolute BLEU score on the code generation testbed CoNaLa.
arXiv Detail & Related papers (2020-04-20T01:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.