NL2GDPR: Automatically Develop GDPR Compliant Android Application
Features from Natural Language
- URL: http://arxiv.org/abs/2208.13361v1
- Date: Mon, 29 Aug 2022 04:16:50 GMT
- Title: NL2GDPR: Automatically Develop GDPR Compliant Android Application
Features from Natural Language
- Authors: Faysal Hossain Shezan, Yingjie Lao, Minlong Peng, Xin Wang, Mingming
Sun, Ping Li
- Abstract summary: NL2 is an information extraction tool developed by Baidu Cognitive Computing Lab.
It generates privacycentric information and generating privacy policies.
It can achieve 92.9% identification of policies related to personal storage process, data process, and types respectively.
- Score: 28.51179772165298
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The recent privacy leakage incidences and the more strict policy regulations
demand a much higher standard of compliance for companies and mobile apps.
However, such obligations also impose significant challenges on app developers
for complying with these regulations that contain various perspectives,
activities, and roles, especially for small companies and developers who are
less experienced in this matter or with limited resources. To address these
hurdles, we develop an automatic tool, NL2GDPR, which can generate policies
from natural language descriptions from the developer while also ensuring the
app's functionalities are compliant with General Data Protection Regulation
(GDPR). NL2GDPR is developed by leveraging an information extraction tool, OIA
(Open Information Annotation), developed by Baidu Cognitive Computing Lab.
At the core, NL2GDPR is a privacy-centric information extraction model,
appended with a GDPR policy finder and a policy generator. We perform a
comprehensive study to grasp the challenges in extracting privacy-centric
information and generating privacy policies, while exploiting optimizations for
this specific task. With NL2GDPR, we can achieve 92.9%, 95.2%, and 98.4%
accuracy in correctly identifying GDPR policies related to personal data
storage, process, and share types, respectively. To the best of our knowledge,
NL2GDPR is the first tool that allows a developer to automatically generate
GDPR compliant policies, with only the need of entering the natural language
for describing the app features. Note that other non-GDPR-related features
might be integrated with the generated features to build a complex app.
Related papers
- PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action [54.11479432110771]
PrivacyLens is a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories.
We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds.
State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions.
arXiv Detail & Related papers (2024-08-29T17:58:38Z) - PolicyLR: A Logic Representation For Privacy Policies [34.73520882451813]
We propose PolicyLR, a new paradigm that offers a comprehensive machine-readable representation of privacy policies.
PolicyLR converts privacy policies into a machine-readable format using valuations of atomic formulae.
We demonstrate PolicyLR in three privacy tasks: Policy Compliance, Inconsistency Detection and Privacy Comparison Shopping.
arXiv Detail & Related papers (2024-08-27T07:27:16Z) - Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data [51.41288763521186]
Retrieval-augmented generation (RAG) enhances the outputs of language models by integrating relevant information retrieved from external knowledge sources.
RAG systems may face severe privacy risks when retrieving private data.
We propose using synthetic data as a privacy-preserving alternative for the retrieval data.
arXiv Detail & Related papers (2024-06-20T22:53:09Z) - An Exploratory Mixed-Methods Study on General Data Protection Regulation (GDPR) Compliance in Open-Source Software [4.2610816955137]
European Union's General Data Protection Regulation require software developers to meet privacy requirements interacting with users' data.
Prior research describes impact of such laws on development, but only when commercial software.
arXiv Detail & Related papers (2024-06-20T20:38:33Z) - Demystifying Legalese: An Automated Approach for Summarizing and Analyzing Overlaps in Privacy Policies and Terms of Service [0.6240153531166704]
Our work seeks to alleviate this issue by developing language models that provide automated, accessible summaries and scores for such documents.
We compared transformer-based and conventional models during training on our dataset, and RoBERTa performed better overall with a remarkable 0.74 F1-score.
arXiv Detail & Related papers (2024-04-17T19:53:59Z) - Towards an Enforceable GDPR Specification [49.1574468325115]
Privacy by Design (PbD) is prescribed by modern privacy regulations such as the EU's.
One emerging technique to realize PbD is enforcement (RE)
We present a set of requirements and an iterative methodology for creating formal specifications of legal provisions.
arXiv Detail & Related papers (2024-02-27T09:38:51Z) - Helping Code Reviewer Prioritize: Pinpointing Personal Data and its
Processing [0.9238700679836852]
We have designed two specialized views to help code reviewers in prioritizing their work related to personal data.
Our approach, evaluated on four open-source GitHub applications, demonstrated a precision rate of 0.87 in identifying personal data flows.
This solution, designed to augment the efficiency of privacy-related analysis tasks such as the Record of Processing Activities (ROPA), aims to conserve resources, thereby saving time and enhancing productivity for code reviewers.
arXiv Detail & Related papers (2023-06-20T12:30:46Z) - Automated Detection of GDPR Disclosure Requirements in Privacy Policies
using Deep Active Learning [3.659023646021795]
Most privacy policies are verbose, full of jargon, and vaguely describe companies' data practices and users' rights.
In this paper, we create a privacy policy dataset of 1,080 websites labeled with the 18 requirements.
We develop a Convolutional Network (CNN) based model which can classify the privacy policies with an accuracy of 89.2%.
arXiv Detail & Related papers (2021-11-08T01:28:27Z) - Building a Foundation for Data-Driven, Interpretable, and Robust Policy
Design using the AI Economist [67.08543240320756]
We show that the AI Economist framework enables effective, flexible, and interpretable policy design using two-level reinforcement learning and data-driven simulations.
We find that log-linear policies trained using RL significantly improve social welfare, based on both public health and economic outcomes, compared to past outcomes.
arXiv Detail & Related papers (2021-08-06T01:30:41Z) - Incorporating External Knowledge through Pre-training for Natural
Language to Code Generation [97.97049697457425]
Open-domain code generation aims to generate code in a general-purpose programming language from natural language (NL) intents.
We explore the effectiveness of incorporating two varieties of external knowledge into NL-to-code generation: automatically mined NL-code pairs from the online programming QA forum StackOverflow and programming language API documentation.
Our evaluations show that combining the two sources with data augmentation and retrieval-based data re-sampling improves the current state-of-the-art by up to 2.2% absolute BLEU score on the code generation testbed CoNaLa.
arXiv Detail & Related papers (2020-04-20T01:45:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.