Related papers: The Use of NLP-Based Text Representation Techniques to Support Requirement Engineering Tasks: A Systematic Mapping Review

The Use of NLP-Based Text Representation Techniques to Support Requirement Engineering Tasks: A Systematic Mapping Review

URL: http://arxiv.org/abs/2206.00421v1
Date: Tue, 17 May 2022 02:47:26 GMT
Title: The Use of NLP-Based Text Representation Techniques to Support Requirement Engineering Tasks: A Systematic Mapping Review
Authors: Riad Sonbol, Ghaida Rebdawi, Nada Ghneim
Abstract summary: The research direction has changed from the use of lexical and syntactic features to the use of advanced embedding techniques. We identify four gaps in the existing literature, why they matter, and how future research can begin to address them.
Score: 1.5469452301122177
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Natural Language Processing (NLP) is widely used to support the automation of different Requirements Engineering (RE) tasks. Most of the proposed approaches start with various NLP steps that analyze requirements statements, extract their linguistic information, and convert them to easy-to-process representations, such as lists of features or embedding-based vector representations. These NLP-based representations are usually used at a later stage as inputs for machine learning techniques or rule-based methods. Thus, requirements representations play a major role in determining the accuracy of different approaches. In this paper, we conducted a survey in the form of a systematic literature mapping (classification) to find out (1) what are the representations used in RE tasks literature, (2) what is the main focus of these works, (3) what are the main research directions in this domain, and (4) what are the gaps and potential future directions. After compiling an initial pool of 2,227 papers, and applying a set of inclusion/exclusion criteria, we obtained a final pool containing 104 relevant papers. Our survey shows that the research direction has changed from the use of lexical and syntactic features to the use of advanced embedding techniques, especially in the last two years. Using advanced embedding representations has proved its effectiveness in most RE tasks (such as requirement analysis, extracting requirements from reviews and forums, and semantic-level quality tasks). However, representations that are based on lexical and syntactic features are still more appropriate for other RE tasks (such as modeling and syntax-level quality tasks) since they provide the required information for the rules and regular expressions used when handling these tasks. In addition, we identify four gaps in the existing literature, why they matter, and how future research can begin to address them.

Related papers

A Short Survey on Formalising Software Requirements using Large Language Models [0.0]
This paper presents a focused literature survey on the use of large language models (LLM) to assist in writing formal specifications for software.<n>A summary of thirty-five key papers is presented, including examples for specifying programs written in Dafny, C and Java.
arXiv Detail & Related papers (2025-06-13T15:26:58Z)
A Comprehensive Study on the Use of Word Embedding Models in Software Engineering Domain [16.40945129377773]
This study focuses on the use of word embedding (WE) models in the software engineering (SE) domain.<n> 181 primary studies published in mainstream software engineering venues are collected for analysis.<n>We get a systematical view of the current practice of using WE for the SE domain, and figure out the challenges and actions in adopting or developing practical semantic representation approaches for the SE artifacts used in a series of SE tasks.
arXiv Detail & Related papers (2025-05-23T08:52:29Z)
Can LLMs Generate Tabular Summaries of Science Papers? Rethinking the Evaluation Protocol [83.90769864167301]
Literature review tables are essential for summarizing and comparing collections of scientific papers. We explore the task of generating tables that best fulfill a user's informational needs given a collection of scientific papers. Our contributions focus on three key challenges encountered in real-world use: (i) User prompts are often under-specified; (ii) Retrieved candidate papers frequently contain irrelevant content; and (iii) Task evaluation should move beyond shallow text similarity techniques.
arXiv Detail & Related papers (2025-04-14T14:52:28Z)
Learning Task Representations from In-Context Learning [73.72066284711462]
Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning. We introduce an automated formulation for encoding task information in ICL prompts as a function of attention heads. We show that our method's effectiveness stems from aligning the distribution of the last hidden state with that of an optimally performing in-context-learned model.
arXiv Detail & Related papers (2025-02-08T00:16:44Z)
ResearchArena: Benchmarking Large Language Models' Ability to Collect and Organize Information as Research Agents [21.17856299966841]
This study introduces ResearchArena, a benchmark designed to evaluate large language models (LLMs) in conducting academic surveys. To support these opportunities, we construct an environment of 12M full-text academic papers and 7.9K survey papers.
arXiv Detail & Related papers (2024-06-13T03:26:30Z)
Natural Language Processing for Requirements Traceability [47.93107382627423]
Traceability plays a crucial role in requirements and software engineering, particularly for safety-critical systems. Natural language processing (NLP) and related techniques have made considerable progress in the past decade.
arXiv Detail & Related papers (2024-05-17T15:17:00Z)
Tasks People Prompt: A Taxonomy of LLM Downstream Tasks in Software Verification and Falsification Approaches [2.687757575672707]
We develop a novel downstream-task taxonomy to perform classification, mapping, and analysis. The main taxonomy requirement is to highlight commonalities while exhibiting variation points of task types.
arXiv Detail & Related papers (2024-04-14T23:45:23Z)
Bridging Research and Readers: A Multi-Modal Automated Academic Papers Interpretation System [47.13932723910289]
We introduce an open-source multi-modal automated academic paper interpretation system (MMAPIS) with three-step process stages. It employs the hybrid modality preprocessing and alignment module to extract plain text, and tables or figures from documents separately. It then aligns this information based on the section names they belong to, ensuring that data with identical section names are categorized under the same section. It utilizes the extracted section names to divide the article into shorter text segments, facilitating specific summarizations both within and between sections via LLMs.
arXiv Detail & Related papers (2024-01-17T11:50:53Z)
Practical Guidelines for the Selection and Evaluation of Natural Language Processing Techniques in Requirements Engineering [8.779031107963942]
Natural language (NL) is now a cornerstone of requirements automation. With so many different NLP solution strategies available, it can be challenging to choose the right strategy for a specific RE task. In particular, we discuss how to choose among different strategies such as traditional NLP, feature-based machine learning, and language-model-based methods.
arXiv Detail & Related papers (2024-01-03T02:24:35Z)
Natural Language Processing for Requirements Formalization: How to Derive New Approaches? [0.32885740436059047]
We present and discuss principal ideas and state-of-the-art methodologies from the field of NLP. We discuss two different approaches in detail and highlight the iterative development of rule sets. The presented methods are demonstrated on two industrial use cases from the automotive and railway domains.
arXiv Detail & Related papers (2023-09-23T05:45:19Z)
Requirement Formalisation using Natural Language Processing and Machine Learning: A Systematic Review [11.292853646607888]
We conducted a systematic literature review to outline the current state-of-the-art of NLP and ML techniques in Requirement Engineering. We found that NLP approaches are the most common NLP techniques used for automatic RF, primary operating on structured and semi-structured data. This study also revealed that Deep Learning (DL) technique are not widely used, instead classical ML techniques are predominant in the surveyed studies.
arXiv Detail & Related papers (2023-03-18T17:36:21Z)
An Inclusive Notion of Text [69.36678873492373]
We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP. We introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling.
arXiv Detail & Related papers (2022-11-10T14:26:43Z)
Recitation-Augmented Language Models [85.30591349383849]
We show that RECITE is a powerful paradigm for knowledge-intensive NLP tasks. Specifically, we show that by utilizing recitation as the intermediate step, a recite-and-answer scheme can achieve new state-of-the-art performance.
arXiv Detail & Related papers (2022-10-04T00:49:20Z)
QASem Parsing: Text-to-text Modeling of QA-based Semantics [19.42681342441062]
We consider three QA-based semantic tasks, namely, QA-SRL, QANom and QADiscourse. We release the first unified QASem parsing tool, practical for downstream applications.
arXiv Detail & Related papers (2022-05-23T15:56:07Z)
Exploring Multi-Modal Representations for Ambiguity Detection & Coreference Resolution in the SIMMC 2.0 Challenge [60.616313552585645]
We present models for effective Ambiguity Detection and Coreference Resolution in Conversational AI. Specifically, we use TOD-BERT and LXMERT based models, compare them to a number of baselines and provide ablation experiments. Our results show that (1) language models are able to exploit correlations in the data to detect ambiguity; and (2) unimodal coreference resolution models can avoid the need for a vision component.
arXiv Detail & Related papers (2022-02-25T12:10:02Z)
Infusing Finetuning with Semantic Dependencies [62.37697048781823]
We show that, unlike syntax, semantics is not brought to the surface by today's pretrained models. We then use convolutional graph encoders to explicitly incorporate semantic parses into task-specific finetuning.
arXiv Detail & Related papers (2020-12-10T01:27:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.