Related papers: NL2GDPR: Automatically Develop GDPR Compliant Android Application Features from Natural Language

NL2GDPR: Automatically Develop GDPR Compliant Android Application Features from Natural Language

URL: http://arxiv.org/abs/2208.13361v1
Date: Mon, 29 Aug 2022 04:16:50 GMT
Title: NL2GDPR: Automatically Develop GDPR Compliant Android Application Features from Natural Language
Authors: Faysal Hossain Shezan, Yingjie Lao, Minlong Peng, Xin Wang, Mingming Sun, Ping Li
Abstract summary: NL2 is an information extraction tool developed by Baidu Cognitive Computing Lab. It generates privacycentric information and generating privacy policies. It can achieve 92.9% identification of policies related to personal storage process, data process, and types respectively.
Score: 28.51179772165298
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The recent privacy leakage incidences and the more strict policy regulations demand a much higher standard of compliance for companies and mobile apps. However, such obligations also impose significant challenges on app developers for complying with these regulations that contain various perspectives, activities, and roles, especially for small companies and developers who are less experienced in this matter or with limited resources. To address these hurdles, we develop an automatic tool, NL2GDPR, which can generate policies from natural language descriptions from the developer while also ensuring the app's functionalities are compliant with General Data Protection Regulation (GDPR). NL2GDPR is developed by leveraging an information extraction tool, OIA (Open Information Annotation), developed by Baidu Cognitive Computing Lab. At the core, NL2GDPR is a privacy-centric information extraction model, appended with a GDPR policy finder and a policy generator. We perform a comprehensive study to grasp the challenges in extracting privacy-centric information and generating privacy policies, while exploiting optimizations for this specific task. With NL2GDPR, we can achieve 92.9%, 95.2%, and 98.4% accuracy in correctly identifying GDPR policies related to personal data storage, process, and share types, respectively. To the best of our knowledge, NL2GDPR is the first tool that allows a developer to automatically generate GDPR compliant policies, with only the need of entering the natural language for describing the app features. Note that other non-GDPR-related features might be integrated with the generated features to build a complex app.

Related papers

The European Union general data protection regulation: what it is and what it means [0.17041248235270653]
paper introduces strategic approach regulating data and the normative foundations' of European Union's General Data Protection Regulation ('General Data Regulation')<n>Paper explains genesis, as best understood an extension and complicate existing requirements imposed by 1995 Protection Directive; describe Data Data approach; make predictions about provisions; highlight U.S. privacy law implications.
arXiv Detail & Related papers (2025-10-03T09:54:30Z)
GDPRShield: AI-Powered GDPR Support for Software Developers in Small and Medium-Sized Enterprises [0.0]
This paper introduces a novel AI-powered framework called "ShieldShield" specifically designed to enhance awareness of SME software developers.<n>"ShieldShield" boosts developers motivation to comply with data violations from early stages of software development.
arXiv Detail & Related papers (2025-05-19T02:47:44Z)
Instruction Tuning on Public Government and Cultural Data for Low-Resource Language: a Case Study in Kazakh [57.002807772016524]
We introduce and open-source a large-scale (10,600 samples) instruction-following dataset, covering key institutional and cultural knowledge relevant to Kazakhstan.<n>We employ LLM-assisted data generation, comparing open-weight and closed-weight models for dataset construction, and select GPT-4o as the backbone.<n>We show that fine-tuning Qwen, Falcon, and Gemma on our dataset leads to consistent performance improvements in both multiple-choice and generative tasks.
arXiv Detail & Related papers (2025-02-19T11:44:27Z)
SPRI: Aligning Large Language Models with Context-Situated Principles [53.07731637246485]
Situated-PRInciples (SPRI) is designed to automatically generate guiding principles in real-time for each input query and utilize them to align each response. We evaluate SPRI on three tasks, and show that SPRI can derive principles in a complex domain-specific task that leads to on-par performance as expert-crafted ones.
arXiv Detail & Related papers (2025-02-05T17:32:29Z)
GDPR-Relevant Privacy Concerns in Mobile Apps Research: A Systematic Literature Review [3.5294997953439426]
Data subject rights are fundamental to data rights individuals over their personal data. Some concepts such as data subject rights individuals over their personal data are fundamental, yet under-explored in the landscape.
arXiv Detail & Related papers (2024-11-28T13:42:46Z)
PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action [54.11479432110771]
PrivacyLens is a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories. We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds. State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions.
arXiv Detail & Related papers (2024-08-29T17:58:38Z)
PolicyLR: A Logic Representation For Privacy Policies [34.73520882451813]
We propose PolicyLR, a new paradigm that offers a comprehensive machine-readable representation of privacy policies. PolicyLR converts privacy policies into a machine-readable format using valuations of atomic formulae. We demonstrate PolicyLR in three privacy tasks: Policy Compliance, Inconsistency Detection and Privacy Comparison Shopping.
arXiv Detail & Related papers (2024-08-27T07:27:16Z)
Mitigating the Privacy Issues in Retrieval-Augmented Generation (RAG) via Pure Synthetic Data [51.41288763521186]
Retrieval-augmented generation (RAG) enhances the outputs of language models by integrating relevant information retrieved from external knowledge sources. RAG systems may face severe privacy risks when retrieving private data. We propose using synthetic data as a privacy-preserving alternative for the retrieval data.
arXiv Detail & Related papers (2024-06-20T22:53:09Z)
An Exploratory Mixed-Methods Study on General Data Protection Regulation (GDPR) Compliance in Open-Source Software [4.2610816955137]
European Union's General Data Protection Regulation require software developers to meet privacy requirements interacting with users' data. Prior research describes impact of such laws on development, but only when commercial software.
arXiv Detail & Related papers (2024-06-20T20:38:33Z)
Demystifying Legalese: An Automated Approach for Summarizing and Analyzing Overlaps in Privacy Policies and Terms of Service [0.6240153531166704]
Our work seeks to alleviate this issue by developing language models that provide automated, accessible summaries and scores for such documents. We compared transformer-based and conventional models during training on our dataset, and RoBERTa performed better overall with a remarkable 0.74 F1-score.
arXiv Detail & Related papers (2024-04-17T19:53:59Z)
Towards an Enforceable GDPR Specification [49.1574468325115]
Privacy by Design (PbD) is prescribed by modern privacy regulations such as the EU's. One emerging technique to realize PbD is enforcement (RE) We present a set of requirements and an iterative methodology for creating formal specifications of legal provisions.
arXiv Detail & Related papers (2024-02-27T09:38:51Z)
Helping Code Reviewer Prioritize: Pinpointing Personal Data and its Processing [0.9238700679836852]
We have designed two specialized views to help code reviewers in prioritizing their work related to personal data. Our approach, evaluated on four open-source GitHub applications, demonstrated a precision rate of 0.87 in identifying personal data flows. This solution, designed to augment the efficiency of privacy-related analysis tasks such as the Record of Processing Activities (ROPA), aims to conserve resources, thereby saving time and enhancing productivity for code reviewers.
arXiv Detail & Related papers (2023-06-20T12:30:46Z)
Automated Detection of GDPR Disclosure Requirements in Privacy Policies using Deep Active Learning [3.659023646021795]
Most privacy policies are verbose, full of jargon, and vaguely describe companies' data practices and users' rights. In this paper, we create a privacy policy dataset of 1,080 websites labeled with the 18 requirements. We develop a Convolutional Network (CNN) based model which can classify the privacy policies with an accuracy of 89.2%.
arXiv Detail & Related papers (2021-11-08T01:28:27Z)
Building a Foundation for Data-Driven, Interpretable, and Robust Policy Design using the AI Economist [67.08543240320756]
We show that the AI Economist framework enables effective, flexible, and interpretable policy design using two-level reinforcement learning and data-driven simulations. We find that log-linear policies trained using RL significantly improve social welfare, based on both public health and economic outcomes, compared to past outcomes.
arXiv Detail & Related papers (2021-08-06T01:30:41Z)
Incorporating External Knowledge through Pre-training for Natural Language to Code Generation [97.97049697457425]
Open-domain code generation aims to generate code in a general-purpose programming language from natural language (NL) intents. We explore the effectiveness of incorporating two varieties of external knowledge into NL-to-code generation: automatically mined NL-code pairs from the online programming QA forum StackOverflow and programming language API documentation. Our evaluations show that combining the two sources with data augmentation and retrieval-based data re-sampling improves the current state-of-the-art by up to 2.2% absolute BLEU score on the code generation testbed CoNaLa.
arXiv Detail & Related papers (2020-04-20T01:45:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.