Detecting Privacy Requirements from User Stories with NLP Transfer
Learning Models
- URL: http://arxiv.org/abs/2202.01035v1
- Date: Wed, 2 Feb 2022 14:02:13 GMT
- Title: Detecting Privacy Requirements from User Stories with NLP Transfer
Learning Models
- Authors: Francesco Casillo, Vincenzo Deufemia and Carmine Gravino
- Abstract summary: We present an approach to decrease privacy risks during agile software development by automatically detecting privacy-related information.
The proposed approach combines Natural Language Processing (NLP) and linguistic resources with deep learning algorithms to identify privacy aspects into User Stories.
- Score: 1.6951941479979717
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: To provide privacy-aware software systems, it is crucial to consider privacy
from the very beginning of the development. However, developers do not have the
expertise and the knowledge required to embed the legal and social requirements
for data protection into software systems. Objective: We present an approach to
decrease privacy risks during agile software development by automatically
detecting privacy-related information in the context of user story
requirements, a prominent notation in agile Requirement Engineering (RE).
Methods: The proposed approach combines Natural Language Processing (NLP) and
linguistic resources with deep learning algorithms to identify privacy aspects
into User Stories. NLP technologies are used to extract information regarding
the semantic and syntactic structure of the text. This information is then
processed by a pre-trained convolutional neural network, which paved the way
for the implementation of a Transfer Learning technique. We evaluate the
proposed approach by performing an empirical study with a dataset of 1680 user
stories. Results: The experimental results show that deep learning algorithms
allow to obtain better predictions than those achieved with conventional
(shallow) machine learning methods. Moreover, the application of Transfer
Learning allows to considerably improve the accuracy of the predictions, ca.
10%. Conclusions: Our study contributes to encourage software engineering
researchers in considering the opportunities to automate privacy detection in
the early phase of design, by also exploiting transfer learning models.
Related papers
- KBAlign: Efficient Self Adaptation on Specific Knowledge Bases [75.78948575957081]
Large language models (LLMs) usually rely on retrieval-augmented generation to exploit knowledge materials in an instant manner.
We propose KBAlign, an approach designed for efficient adaptation to downstream tasks involving knowledge bases.
Our method utilizes iterative training with self-annotated data such as Q&A pairs and revision suggestions, enabling the model to grasp the knowledge content efficiently.
arXiv Detail & Related papers (2024-11-22T08:21:03Z) - Deep Learning and Machine Learning -- Natural Language Processing: From Theory to Application [17.367710635990083]
We focus on natural language processing (NLP) and the role of large language models (LLMs)
This paper discusses advanced data preprocessing techniques and the use of frameworks like Hugging Face for implementing transformer-based models.
It highlights challenges such as handling multilingual data, reducing bias, and ensuring model robustness.
arXiv Detail & Related papers (2024-10-30T09:35:35Z) - Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models [49.043599241803825]
Iterative Contrastive Unlearning (ICU) framework consists of three core components.
A Knowledge Unlearning Induction module removes specific knowledge through an unlearning loss.
A Contrastive Learning Enhancement module to preserve the model's expressive capabilities against the pure unlearning goal.
And an Iterative Unlearning Refinement module that dynamically assess the unlearning extent on specific data pieces and make iterative update.
arXiv Detail & Related papers (2024-07-25T07:09:35Z) - Large Language Models: A New Approach for Privacy Policy Analysis at Scale [1.7570777893613145]
This research proposes the application of Large Language Models (LLMs) as an alternative for effectively and efficiently extracting privacy practices from privacy policies at scale.
We leverage well-known LLMs such as ChatGPT and Llama 2, and offer guidance on the optimal design of prompts, parameters, and models.
Using several renowned datasets in the domain as a benchmark, our evaluation validates its exceptional performance, achieving an F1 score exceeding 93%.
arXiv Detail & Related papers (2024-05-31T15:12:33Z) - Learn When (not) to Trust Language Models: A Privacy-Centric Adaptive Model-Aware Approach [23.34505448257966]
Retrieval-augmented large language models (LLMs) have been remarkably competent in various NLP tasks.
Previous work has proposed to determine when to do/skip the retrieval in a data-aware manner by analyzing the LLMs' pretraining data.
These data-aware methods pose privacy risks and memory limitations, especially when requiring access to sensitive or extensive pretraining data.
We hypothesize that token embeddings are able to capture the model's intrinsic knowledge, which offers a safer and more straightforward way to judge the need for retrieval without the privacy risks associated with accessing pre-training data.
arXiv Detail & Related papers (2024-04-04T15:21:22Z) - Using Machine Learning To Identify Software Weaknesses From Software
Requirement Specifications [49.1574468325115]
This research focuses on finding an efficient machine learning algorithm to identify software weaknesses from requirement specifications.
Keywords extracted using latent semantic analysis help map the CWE categories to PROMISE_exp. Naive Bayes, support vector machine (SVM), decision trees, neural network, and convolutional neural network (CNN) algorithms were tested.
arXiv Detail & Related papers (2023-08-10T13:19:10Z) - Your Room is not Private: Gradient Inversion Attack on Reinforcement
Learning [47.96266341738642]
Privacy emerges as a pivotal concern within the realm of embodied AI, as the robot accesses substantial personal information.
This paper proposes an attack on the value-based algorithm and the gradient-based algorithm, utilizing gradient inversion to reconstruct states, actions, and supervision signals.
arXiv Detail & Related papers (2023-06-15T16:53:26Z) - Human-Centric Multimodal Machine Learning: Recent Advances and Testbed
on AI-based Recruitment [66.91538273487379]
There is a certain consensus about the need to develop AI applications with a Human-Centric approach.
Human-Centric Machine Learning needs to be developed based on four main requirements: (i) utility and social good; (ii) privacy and data ownership; (iii) transparency and accountability; and (iv) fairness in AI-driven decision-making processes.
We study how current multimodal algorithms based on heterogeneous sources of information are affected by sensitive elements and inner biases in the data.
arXiv Detail & Related papers (2023-02-13T16:44:44Z) - Privacy-Preserving Machine Learning for Collaborative Data Sharing via
Auto-encoder Latent Space Embeddings [57.45332961252628]
Privacy-preserving machine learning in data-sharing processes is an ever-critical task.
This paper presents an innovative framework that uses Representation Learning via autoencoders to generate privacy-preserving embedded data.
arXiv Detail & Related papers (2022-11-10T17:36:58Z) - Federated Learning and Differential Privacy: Software tools analysis,
the Sherpa.ai FL framework and methodological guidelines for preserving data
privacy [8.30788601976591]
We present the Sherpa.ai Federated Learning framework that is built upon an holistic view of federated learning and differential privacy.
We show how to follow the methodological guidelines with the Sherpa.ai Federated Learning framework by means of a classification and a regression use cases.
arXiv Detail & Related papers (2020-07-02T06:47:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.