Detecting Requirements Smells With Deep Learning: Experiences,
Challenges and Future Work
- URL: http://arxiv.org/abs/2108.03087v1
- Date: Fri, 6 Aug 2021 12:45:15 GMT
- Title: Detecting Requirements Smells With Deep Learning: Experiences,
Challenges and Future Work
- Authors: Mohammad Kasra Habib, Stefan Wagner, Daniel Graziotin
- Abstract summary: This work aims to improve the previous work by creating a manually labeled dataset and using ensemble learning, Deep Learning (DL), and techniques such as word embeddings and transfer learning to overcome the generalization problem.
The current findings show that the dataset is unbalanced and which class examples should be added more.
- Score: 9.44316959798363
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Requirements Engineering (RE) is the initial step towards building a software
system. The success or failure of a software project is firmly tied to this
phase, based on communication among stakeholders using natural language. The
problem with natural language is that it can easily lead to different
understandings if it is not expressed precisely by the stakeholders involved,
which results in building a product different from the expected one. Previous
work proposed to enhance the quality of the software requirements detecting
language errors based on ISO 29148 requirements language criteria. The existing
solutions apply classical Natural Language Processing (NLP) to detect them. NLP
has some limitations, such as domain dependability which results in poor
generalization capability. Therefore, this work aims to improve the previous
work by creating a manually labeled dataset and using ensemble learning, Deep
Learning (DL), and techniques such as word embeddings and transfer learning to
overcome the generalization problem that is tied with classical NLP and improve
precision and recall metrics using a manually labeled dataset. The current
findings show that the dataset is unbalanced and which class examples should be
added more. It is tempting to train algorithms even if the dataset is not
considerably representative. Whence, the results show that models are
overfitting; in Machine Learning this issue is solved by adding more instances
to the dataset, improving label quality, removing noise, and reducing the
learning algorithms complexity, which is planned for this research.
Related papers
- MARIO: MAth Reasoning with code Interpreter Output -- A Reproducible
Pipeline [12.186691561822256]
We postulate that the inherent nature of large language models (LLMs) presents challenges in modeling mathematical reasoning.
This paper introduces a novel math dataset, enhanced with a capability to utilize a Python code interpreter.
We propose a tentative, easily replicable protocol for the fine-tuning of math-specific LLMs.
arXiv Detail & Related papers (2024-01-16T08:08:01Z) - Ensemble Transfer Learning for Multilingual Coreference Resolution [60.409789753164944]
A problem that frequently occurs when working with a non-English language is the scarcity of annotated training data.
We design a simple but effective ensemble-based framework that combines various transfer learning techniques.
We also propose a low-cost TL method that bootstraps coreference resolution models by utilizing Wikipedia anchor texts.
arXiv Detail & Related papers (2023-01-22T18:22:55Z) - Deep Sequence Models for Text Classification Tasks [0.007329200485567826]
Natural Language Processing (NLP) is equipping machines to understand human diverse and complicated languages.
Common text classification application includes information retrieval, modeling news topic, theme extraction, sentiment analysis, and spam detection.
Sequence models such as RNN, GRU, and LSTM is a breakthrough for tasks with long-range dependencies.
Results generated were excellent with most of the models performing within the range of 80% and 94%.
arXiv Detail & Related papers (2022-07-18T18:47:18Z) - LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning
Tasks [22.274913349275817]
Fine-tuning pretrained language models (LMs) without making any architectural changes has become a norm for learning various language downstream tasks.
We propose Language-Interfaced Fine-Tuning (LIFT) to solve non-language downstream tasks without changing the model architecture or loss function.
LIFT does not make any changes to the model architecture or loss function, and it relies on the natural language interface.
arXiv Detail & Related papers (2022-06-14T02:41:41Z) - Communication-Efficient Robust Federated Learning with Noisy Labels [144.31995882209932]
Federated learning (FL) is a promising privacy-preserving machine learning paradigm over distributed located data.
We propose a learning-based reweighting approach to mitigate the effect of noisy labels in FL.
Our approach has shown superior performance on several real-world datasets compared to various baselines.
arXiv Detail & Related papers (2022-06-11T16:21:17Z) - Improving Pre-trained Language Models with Syntactic Dependency
Prediction Task for Chinese Semantic Error Recognition [52.55136323341319]
Existing Chinese text error detection mainly focuses on spelling and simple grammatical errors.
Chinese semantic errors are understudied and more complex that humans cannot easily recognize.
arXiv Detail & Related papers (2022-04-15T13:55:32Z) - Few-shot Named Entity Recognition with Cloze Questions [3.561183926088611]
We propose a simple and intuitive adaptation of Pattern-Exploiting Training (PET), a recent approach which combines the cloze-questions mechanism and fine-tuning for few-shot learning.
Our approach achieves considerably better performance than standard fine-tuning and comparable or improved results with respect to other few-shot baselines.
arXiv Detail & Related papers (2021-11-24T11:08:59Z) - Improving Classifier Training Efficiency for Automatic Cyberbullying
Detection with Feature Density [58.64907136562178]
We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods.
We hypothesise that estimating dataset complexity allows for the reduction of the number of required experiments.
The difference in linguistic complexity of datasets allows us to additionally discuss the efficacy of linguistically-backed word preprocessing.
arXiv Detail & Related papers (2021-11-02T15:48:28Z) - Competency Problems: On Finding and Removing Artifacts in Language Data [50.09608320112584]
We argue that for complex language understanding tasks, all simple feature correlations are spurious.
We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account.
arXiv Detail & Related papers (2021-04-17T21:34:10Z) - Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning.
We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class.
We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.